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Summary  of  Program  for 

Reporting  Period 

Program  Objectlvea 

To  develop  practical,  real  time  methods  for  suporessing 
noise  which  has  been  acoustically  added  to  speech. 

To  demonstrate  that  through  the  incorporation  of  the 
noise  suppression  methods,  speech  can  be  effectively 
analyzed  for  narrow  band  digital  transmission  in  practical 
operating  environments. 


Summary  of  Tasks  and  Results 

Introduction 

This  semi-annual  technical  report  describes  the  current 
status  in  five  research  areas  for  the  period  1 April  1977 
through  30  September  1977. 

Application  of  the  SABER  method  for  Improved  Spectral 
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Analysis  of  Noisy  Speech-Steven  F.  Boll. 


A method  is  developed  for  reducing?  the  effect  of 
acoustically  added  background  noise  when  spectrally 
analyzing  speech  using  Linear  Prediction.  Fundamental  to 
the  method  is  the  result  that  the  spectral  magnitude  of 
speech  plus  noise  can  be  modeled  as  the  sum  of  magnitudes  of 
speech  and  noise.  This  phase  independent  model  allows  for 
noise  to  be  suppressed  by  subtracting  the  expected  noise 
spectrum  from  the  locally  averaged  speech  spectrum.  Using 
the  Spectral  Averaging  for  Bias  Estimation  and  Removal,  or 
SABER  method,  a noise  reduction  and  corresponding  signal  to 
noise  Improvement  of  15  dB  is  realized  on  both  digitally 
added  white  Gaussian  noise  and  acoustically  added  helicopter 
noise. 

Current  Results  on  Dual  Input  Nonstationary  Noise 
Suppression  Using  LMS  Adaptive  Noise  Cancellatlon-Dennls 
Pulsipher . 

The  previous  Semi-Annual  technical  report  described  the 
successful  application  of  the  two  microphone  Widrow-Hoff 
Least  Mean  Square  (LMS)  algorithm  for  removing  digitally 
added  noise  from  speech.  The  method  is  now  being  applied 
for  removing  nonstationary  acoustically  added  noise. 
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Preliminary  results  show  that  narrow  band  periodic  noise  can 
be  completely  eliminated  and  broad  band  colored  noise  can  be 
reduced  by  at  least  10  dB.  When  used  in  realistic  operatln/f' 
environments,  a filter  lenfrth  on  the  order  of  300  ms.  is 
required  for  broad  band  noise  reduction. 

Estimation  of  the  Parameters  of  an  Autoregressive-Moving 
Average  Process  in  the  Present  of  Noise-William  Done. 

This  task  considers  an  approach  to  parameter  estimation 
in  the  presence  of  noise  which  Involves  the  construction  of 
a new  model  which  explicity  accounts  for  the  effects  of 
noise . 


An  analysis  method  is  developed  for  parameter 
extraction  which  is  significantly  change  from  the  standard 
LPC  methods  used  when  no  noise  is  present.  It  is  shown  that 
the  addition  of  noise  to  an  all-pole  or  autoregressive  (AR) 
process  results  in  pole-zero  or  autoregressive-moving 
average  AFMA  process.  Methods  for  estimating  the  parameters 
of  the  ARHA  process  are  considered. 

Multirate  Signal  Processing-H . Ravindra 

The  aim  of  this  project  is  to  simulate  a system  on  a 
digital  computer,  which  can  Increase  or  decrease  the 
sampling  rate  of  a digitized  acoustic  signal.  These  two 
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operations  are  called  Interpolation  and  Decimation 
respectively.  This  project  is  the  first  phase  of  a*  larger 
project  which  involves  the  simulation  of  a CVSD  system,  with 
an  idea  of  studving  the  problems  of  tandeming.  Since  the 
CVSD  performance  ( s i gna 1-to-no ise  ratio)  is  better  at  higher 
sampling  rates,  an  Interpolation/Decimation  scheme  is 
required  to  translate  the  sampling  rate  from  6.67  KHz  to 
higher  rates. 


Simulation  of  Continuously  Variable  Slope 
Delta  Modulation  (CVSD)  H.  Ravindra 

A FORTRAN  CVSD  simulation  was  developed  and  implemented 
using  the  specification  defined  by  Joe  Tleney  in  Network 
Speech  Compression  note  15,  April  23,  197^.  Using  the 
Multirate  Signal  Processing  program  CVSD  coded  speech  can  be 
generated  at  rates  of  9.6,  16,  20,  and  32  KBPS  based  on 
input  speech  sampled  at  6.67  KHZ. 

Future  Efforts 

SABER  Development:  The  SABER  algorithm  will  be 
modified  to  work  in  a "stand-alone"  mode.  In  this 
Implementation  the  speech  will  be  windowed  and  transformed. 
The  spectral  magnitudes  will  be  averaged  and  the  noise  bias 
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removed.  Then  using  the  saved  phase  a time  signal  will  he 
regenerated.  This  implementation  will  allow  for  noise 
suppression  without  affecting  the  bandwidth  compression 
analyzer.  Also  intelligibility  and  quality  measurements 
using  the  DRT  will  be  conducted  on  the  processed  speech  and 
compared  with  scores  having  no  noise  suppression. 

Adaptive  Noise  Cancelling:  Fundemental  performance 
limits  for  the  method's  ability  to  reduce  acoustically  added 
noise  in  realistic  environment  will  be  established. 
Performance  of  the  method  in  nonstationary  noise 
environments  will  be  demonstrated.  Requirements  for  real 
time,  practical  implementation  will  be  specified. 

Parameter  Estimation  in  Noise:  Research  will  continue 
towards  development  of  effective  parameter  extraction 
methods  with  consider  noise  as  a fundamental  component  in 
the  modeling  process. 
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APPLICATION  OF  THE  SABER  METHOD  FOR 

IMPROVED  SPECTRAl  ANALYSIS  OF  NOISY  SPEECH 


Steven  F.  Boll,  Ph.  D 
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Chapter  I 


Abstract 


A method  is  developed  for  reducing:  the  effect  of 
acoustically  added  background  noise  when  spectrally 
analyzing  speech  using  Linear  Prediction.  Fundamental  to 
the  method  is  the  result  that  the  spectral  magnitude  of 
speech  plus  noise  can  be  modeled  as  the  sum  of  magnitudes  of 
speech  and  noise.  This  phase  Independent  model  allows  for 
noise  to  be  suppressed  by  subtracting  the  expected  noise 
spectrum  from  the  locally  averaged  speech  spectrum.  Using 
the  Spectral  Averaging  for  Bias  Estimation  and  Removal,  or 
SABER  method,  a noise  reduction  and  corresponding 
slgnal-to-noise  improvement  of  15dB  is  realized  on  both 
digitally  added  white  Gaussian  noise  and  acoustically  added 
helicopter  noise. 


Chapter  II 


Sumsiary 

This  report  describes  an  Integrated  noise 
suppression-speech  analysis  method  for  reducing  the  effect 
of  background  noise  when  spectrally  analyzing  speech  using 
Linear  Prediction.  Basically  it  is  shown  that  additive 
noise  exhibits  itself  as  a bias  added  to  the  desired  speech 
spectrum.  Through  spectral  averaging  this  bias  will  build 
up  allowing  it  to  be  effectively  removed  using  its  expected 
value  calculated  during  non-speech  activity.  The  method  is 
called  SABER,  an  acronym  for  Spectral  Averaging  for  Bias 
Estimation  and  Removal.  This  chapter  summarizes  the 
Objectives,  Assumptions,  Approach,  and  Results  for  the 
method.  Detailed  developments  are  provided  in  subsequent 
chapters  . 


Objectives 

1.  Develop  and  integrate  a noise  suppression  algorithm 
into  the  narrow  band  LPC  speech  analysis  algorithm. 

2.  Insure  that  the  algorithm's  effectiveness  should  be 
Independent  of  any  specific  environment's  noise 
characteristics. 

3.  Require  the  algorithm  to  only  need  a single  microphone. 
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In  implementing  the  algorithm,  a minimal  impact  should 
result  on  existing  narrow  band  systems, 
e.g.  The  same  channel  parameters  should  be  used,  thus 
allowing  for  the  new  system  to  be  compatible 
with  other  LPC  terminals. 

In  the  absence  of  noise  the  method  should  generate 
synthetic  speech  equivalent  in  intelligibility  and 
quality  to  standard  LPC  systems. 

The  method  should  not  only  be  able  to  improve  spectral 
resolution  but  also  improve  pitch  and  voicing 
estimation . 

The  method  should  use  standard,  well  understood 
estimation  techniques  and  be  implementable  in  real 
time. 


Assumptions 

The  background  noise  is  acoustically  or  electrically 
added  to  the  speech. 

The  background  noise  environment  remains  locally 
stationary  to  the  degree  that  its  spectral  magnitude 
expected  value  Just  prior  to  speech  activity  equals  its 
expected  value  during  speech  activity. 

If  the  environment  changes  to  a new  stationary  state, 
there  exists  enough  time  (on  the  order  of  300  ms)  to 
estimate  a new  background  noise  spectral  magnitude 
expected  value  before  speech  activity  commences. 


- 8 - 


<4,  If  the  rn  V i roiiiren  t.  chancres  to  a new  atationary  state, 

the  alrorithm  requires  a speech  activity  detector  to 
sij^nal  the  program  that  speech  has  ceased  and  a new 
noise  bias  can  be  estimated. 

Approach 

1.  The  iundamental  property  is  developed  which 

demonstrates  that  the  spectral  magnitude  of  the  noisy 
speech  can  be  effectively  modeled  as  the  sum  of 
magnitudes  of  speech  and  noise.  This  result  is  called 
the  phase  Independent  model. 

2.  Eiased  upon  the  phase  Independent  model,  an  estimate  of 
the  speech  magnitude  spectrum  is  calculated  as  follows: 

a.  The  noisy  speech  magnitude  is  averaged  over 
stationary  vocal  tract  intervals.  Averaging  yields 
a low  variance  bias  noise  tern  added  to  the  speech 
spectrum. 

b.  This  sample  mean  is  then  removed  by  subtracting  the 
expected  noise  spectrum  from  the  averaged  speech 
spectrum . 

c.  Negative  magnitude  frequency  components  are  then 
removed  to  further  increase  noise  rejection. 

d.  The  resulting  magnitude  spectrum  is  then  squared 
and  inverse  transformed  yielding  autocorrelations. 

e.  from  these  autocorrelations  predictor  and 
reflection  coefficients  are  calculated  using  the 
Levinson's  recursion. 
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3.  In  addition,  pitch  and  voicing  information  are 
calculated  from  the  noise  cancelled  spectrum  usinn'  a 
cepstral  pitch  tracker. 

Results 

1.  The  method  will  suppress  white  noise  up  to  the  limit 
allowed  through  reduction  in  variance  of  the  sample 
mean.  As  shown  in  Section  VI,  the  expected  value  of 
noise  reduction  equals  approximate  15dB. 

2.  It  is  shown  that  the  variance  between  actual  speech 

spectrum  and  the  SABER  estimate  equals  the  variance  of 

I the  sample  mean  of  the  additive  noise.  This  coupled 

with  the  fact  that  a 15dB  reduction  in  noise  energy  is 
achievable  suggested  the  following  experiment  for 
measuring  signal-to-noise  Improvement.  Speech  having 
an  average  SNR  of  25dB  was  processed  with  linear 
prediction  and  compared  with  speech  having  a SNR  of 
10dB  and  processed  with  SABER.  Both  informal  listening 
tests  and  spectral  comparisons  demonstrate  that  the  two 
outputs  are  essentially  equivalent.  This  experiment 
suggests  that  a 15dB  improvement  in  SNR  is  possible 
using  SABER. 

3.  Preliminary  experiments  demonstrate  that  correct  pitch 

values  can  be  recovered  when  applying  a cepstral  pitch 

tracker  to  the  noise  cancelled  SABER  spectral  estimate. 

V 

i 

ji- 
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Chapter  III 


► 


System  Description 

Introduction 

This  chapter  describes  the  various  algorithm  stages  for 
implementing  the  SABEfi  method.  Theoretical  justification  in 
support  of  these  procedures  is  provided  in  the  subsequent 
chapters  . 

Data  Buffering 

Data  from  the  A/D  converter  is  stored  in  a buffer 
system  which  is  similar  to  standard  vocoder  designs.  The 
analysis  frame  length  should  be  at  least  twice  as  large  as 
the  maximum  expected  pitch  period  for  adequate  frequency 
resolution  [1].  The  analysis  frames  are  advanced  in  time  by 
overlapping  by  one-half  the  window  length.  As  is  shown  in 
Appendix  A,  the  one-half  overlap  is  optimum  for  minimizlnr 
the  variance  of  the  sample  mean  of  the  magnitude  spectra. 

, Spectral  Magnitude  Calculation 

The  data  in  each  analysis  buffer  is  windowed  with  a 
Hamming  window.  The  buffer  length  is  then  doubled  by 
i extending  with  zeros.  Padding  with  zeros  is  necessary  since 

^ the  autocorrelations  required  for  the  Levinson's  recursion 
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are  obtained  by  inverse  transforming  the  squared  majrnitiide 
frequency  spectrum.  Therefore  to  prevent  temporal  aliasing 
due  to  the  circular  convolutional  property  of  the  DFT , the 
zero  extension  is  necessary.  Following  augmentation  the  DFT 
of  the  buffer  is  taken  and  the  spectral  magnitude  is 
computed  : 

|X{k)|  = (X^(k)  + X^(k))^^^  k = 0,  1,  ....  L - 1 

where 

XR(k)  + jXj(k)  = DFT{x(j)} 


Marnitude  Averaging 

As  is  shown  In  Chapter  VI,  the  variance  of  the  noise 
spectral  estimate  is  reduced  by  averaging  over  as  many 
spectral  magnitude  sets  as  possible.  However  the 
non-stationarity  of  the  speech  limits  the  total  time 
interval  available  for  local  averaging.  The  number  of 
averages  is  limited  by  the  number  of  analysis  windows  which 
can  be  fit  into  the  stationary  speech  time  Interval.  If 
only  reflection  coefficients  are  to  be  estimated,  a 128 
point  analysis  can  be  used,  resulting  in  a five  set  average 
over  a 384  point  stationary  speech  time  interval.  If  both 
reflection  coefficients  and  pitch  are  to  be  estimated  from 
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the  same  analysis  window,  then  a 256  point  window  must  be 
used,  resulting  in  two  set  average. 


Noise  Bias  Estimation 

The  SABER  method  reouires  an  estimate  of  the  expected 
value  of  the  noise  magnitude  spectrum , 

Ufj  = EflNl) 

This  estimate  is  obtained  by  averaging  the  signal  magnitude 

spectrum  |X|  during  non-speech  activity.  Estimating  in 

this  manner  places  certain  constraints  when  Implementing  the 

method.  If  the  noise  remains  stationary  during  the 

subsequent  speech  activity,  then  an  initial  startup  or 

calibration  period  of  noise-only  signal  is  required.  During 

this  period  (on  the  order  of  a third  of  a second)  an 

estimate  of  can  be  computed.  If  the  noise  environment 

is  nonstationary  then  a new  estimate  of  must  be 

calculated  prior  to  bias  removal  each  time  the  noise 

spectrum  changes.  Since  the  estimate  is  computed  using  the 

noise-only  signal  during  non-speech  activity,  a voice  switch 

I is  required.  When  the  voice  switch  is  off  an  averaged  noise 

spectrum  can  be  recomputed.  If  the  noise  magnitude  spectrum 

is  changing  faster  than  estimate  of  it  can  be  computed,  then 

time  averaging  to  estimate  uu  cannot  be  used.  Likewise  if 

N 

the  expected  value  of  the  noise  spectrum  changes  after  an 
estimate  of  it  has  been  computed,  then  noise  reduction 


S 
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through  bias  removal  will  be  less  effective  or  even  harmful. 

Thus  in  summary,  an  estimate  of  the  expected  noise 
spectrum,  . is  calculated  by  averaging  the  noise  signal 

taken  during  non-speech  activity.  This  approach  not  only 
requires  a speech  activity,  detector  and  a short  segment  of 
noise  only  signal  prior  to  speech  activity,  but  requires 
that  the  noise  spectrum  remain  slowly  varying  with  respect 
to  the  bias  estimation. 

Noise  Bias  Removal 

The  SABER  spectral  estimate  is  obtained  by 

subtracting  the  expected  noise  magnitude  spectrum  from 

the  averaged  magnitude  signal  spectrum  fX|.  Thus: 

S^(k)  = |X(k)|  - g^(k)  k = 0,  1 L - 1 


Where  L=DFT  buffer  length. 

After  subtracting,  the  differenced  values  having 
negative  magnitudes  are  set  to  some  small  positive  value 
relative  to  the  average  energy.  These  negative  differences 
represent  frequencies  where  the  sun  of  speech  plus  local 
noise  is  less  than  the  expected  noise.  As  is  shown  In 

Chapter  VI,  replacing  negative  differences  with  small 
positive  values,  results  in  an  smaller  spectral 
approximation  error. 
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LFC  Coefficient  Calculation 


The  LPC  coefficients  corresponding  to  the  spectrum  S/^ 
are  calculated  by  initially  squaring  the  magnitude  spect-ura. 
The  inverse  DFT  of  the  resulting  power  spectrum  is  then 
computed.  The  output  of  this  transform  is  a set  of 

autocorrelations,  R(k),  k = 0,  1 L-1.  A set  of  M LPC 

predictor  or  reflection  coefficients  are  calculated  from  the 
first  M+1  autocorrelations,  using  the  Levinson's  recursion 
[2].  The  coefficients  will  represent  an  LPC  all-pole 

spectral  fit  to  the  spectrum 

Gain  Calculation 

There  are  two  choices  for  the  gain  term  needed  for 

synthesis.  Either  the  rms  of  the  noisv  slrnal,  g^  or  the 

rms  of  the  noise  cancelled  signal,  g 3.  These  gain  terms 
can  be  computed  as: 


9s  = 

where  x(k)  = s(k)  + n(k) 

and  R(k)  * lOFTfS^^^) 
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It  was  decided  that  when  speech  activity  is  present 

that  K is  used  and  when  speech  activity  is  absent  that  r ^ 

is  used.  This  choice  has  the  desirable  effect  of  amplifying 

the  synthesis  output  during  speech  activity  and  attenuating 

the  synthesis  output  during  non-speech  activity,  since  g^^ 

will  be  greater  than  g . The  procedure  for  determining 

s 

whether  to  use  g^^  or  was  to  examine  the  energy  change 
before  and  after  noise  removal.  Let  R equal  the  amount  of 
energy  reduction  in  dB.  Then: 

R = 20  log,0  ^ 

During  the  non-speech  noise  bias  estimation  time 
period,  values  of  R taken  each  analysis  frame  are  averaged 
together  giving  an  average  estimate  of  noise  power  reduction 
1<.  As  is  shown  in  Section  VI,  this  average  value  for  white, 
Gaussian  noise  is  about  15dB.  Speech  activity  is  detected 
by  comparing  the  current  value  of  R with  If  the  current 

value  of  K is  within  5dB  of  the  average,  the  g^  is  picked  as 
the  gain  term.  If  the  current  value  is  smaller  than  the 
average  by  5dB  or  more,  then  g^^  is  used.  Again  the 
reasoning  behind  this  procedure  is  that  in  the  absence  of 
speech  activity  the  power  reduction  should  be  largest  and 
near  its  expected  value.  With  speech  present,  a larger 
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percentage  of  the  spectrum  is  now  speech,  thus  subtracting 
off  the  noise  bias  will  only  slightly  reduce  the  total 
power.  Thus  R^and  R<<n. 

There  are  difficulties  with  this  detection  procedure 
however.  If  the  signal-to-nolse  ratio  is  low  R<<R  even 
during  speech  activity.  This  results  in  the  wrong  gain  term 
being  chosen  and  the  speech  synthesis  is  attentuated  rather 
than  amplified.  Likewise  at  the  other  extreme  if  the  noise 
reduction  value  for  the  current  frame  drops  5dB  below  the 
average  during  non-speech  activity,  the  synthesis  will  be 
amplified.  The  5dB  value  was  an  empirically  determined 
threshold . 


Pitch  Detection 


When  the  noise  contains  periodic  components  in  the 
frequency  range  used  for  pitch  detection,  the  pitch  tracker 
can  track  the  noise  rather  than  the  pitch.  If  these 
harmonics  are  first  removed  by  the  SABER  method,  then  a 
pitch  tracker  which  uses  spectral  magnitude  information  can 
be  used  to  extract  the  actual  pitch  period.  One  such  pitch 
detection  scheme  is  the  cepstral  pitch  tracker  [3]  After 
computing  the  log  is  taken  followed  by  a DPT.  During 
voiced  speech,  the  real  cepstrum  will  exhibit  a spike  at  a 
distance  from  the  origin  equal  to  the  pitch  period.  If 
pitch  detection  is  to  be  done  based  on  the  SABER  output 
spectrum,  then  a sufficiently  long  analysis  time  window  is 
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required.  A 256  point  analysis  window  based  on  a 6.67  KHz 
samplinR  rate  was  used  for  cepstral  pitch  tracking. 


System  Block  Diagram 


Chapter  IV 


Results 

Introduction 

This  chapter  presents  the  results  of  three  experiments 
designed  to  demonstrate  the  improvements  in  noise  deduction 
and  spectral  resolution  solving  from  the  apclication  of  the 
SABER  algorithm  to  speech  analysis.  Two  types  of  noisy 
speech  were  used.  In  experiments  one  and  two  Gaussian  noise 
was  digitally  added  to  clean  text  to  produce  two  data  bases 
having  specified  signa 1-to-noise  ratios  of  lOdB  and  25dB. 
These  controlled  data  bases  allowed  for  access  to  both  the 
clean  and  noisy  speech.  In  experiment  throe  speech  recorded 
in  a helicopter  environment  with  acoustically  added  noise 
was  used  [4].  Synthetic  speech  was  generated  from  each  data 
base.  Results  consist  of  synthesis  time  waveforms  and 
corresponding  all-pole  spectra  when  standard  linear 
prediction  analysis  and  SABER  analysis  are  used. 

Experiment  on  10dB  SNR  Data  Base 

To  determine  the  improvement  in  spectral  resolution 
obtainable  from  the  SABER  algorithm,  a controlled  data  base 
was  constructed.  Broad  band  Gaussian  noise  was  digitized 
from  a standard  analog  noise  generator.  Clean  text  was 
recorded  in  an  acoustically  shielded  sound  proof  room  having 


- 19  - 


an  ambient  noise  level  of  i?7dB.  Both  speech  and  noise  were 
filtered  as  3.2KHz  and  sampled  at  6.67KHz.  The  averapie 
signal  energy  of  each  file  was  measured  [*4]  and  the  noise 
was  scaled  and  added  to  the  speech  to  give  an  average 
s Igna 1 - to-no ise  ratio  of  lOdB.  The  file  energy  calculation 
was  taken  over  both  speech  and  silent  Intervals.  The 
vocoder  analysis-synthesis  program  was  modified  to  process 
either  the  clean  or  noisy  speech.  Three  types  of  synthetic 
speech  were  generated:  LPC  on  clean  speech  (LPC  on  S);  LPC 
on  the  noisy  speech  (LPC  on  S + N ) ; and  SABER  on  the  noisy 
speech  (SABER  on  S+N ) . Note  in  the  absence  of  noise,  as 
shown  in  Chapter  VI,  SABER  reduces  to  a standard  LPC 
analysis,  and  therefore  SABER  on  clean  speech  was  not 
generated. 

The  first  set  of  figures.  Figure  IV. 1 A through  D show 
synthetic  waveforms  and  their  all-pole  spectra  analyzed  from 
the  vowel  in  the  word  "dogs".  Part  E shows  that  second  and 
third  formants  are  clearly  resolved  using  SABER,  while 
considerably  obscured  using  LPC.  However,  part  D shows  that 
complete  noise  cancellation  was  not  achieved.  This  was  to 
be  expected  since  the  clean  speech  had  a SNR  in  excess  of 
10>15  or  25dB. 

Figures  IV. 2 A through  D show  synthetic  waveforms  and 
their  all-pole  spectra  analyzed  from  the  fricative  | sh|  in 
"shade".  Again  better  but  not  perfect  spectral  resolution  | 

is  achieved  by  the  SABER  algorithm  over  standard  LPC.  | 
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SABER  at  10  dB  vs  LPC  at  25  dB 


9 999t*\ 

I 1MC«2  LPC  AT  25  DO 


(B) 

SABER  at  10  dB  vs  LPC  at  25  dB. 


Time  and  Frequency  Functions  for  10  dB  vs  25  dB 
Figure  IV. 3 


Comparisons  Between  10dB  and  25dB  SNR 


^ funi a mental  quertion  when  evaluating  the  amount  of 
noise  rejection  is:  "After  processing,  how  much  improvement 
was  there  in  the  s igna 1-to-noise  ratio?"  Since  signal  and 
noise  energies  cannot  be  separated  and  measured  after 
processing,  it  is  not  possible  to  measure  an  SNR  improvement 
directly.  However  the  following  indirect  method  can  be 
used.  If  as  is  shown  in  Chapter  VI,  the  amount  of  noise 
rejection  is  15dB  for  white  Gaussian  noise,  then  synthetic 
speech  generated  by  the  SABER  algorithm  using  the  10dB  SNR 
data  base  should  sound  approximately  equal  to  the  synthetic 
speech  generated  by  LPC  using  the  25dB  SNR  data  base. 

Such  an  experiment  was  conducted  and  the  results  are 
given  in  Figure  IV. 3 A and  B.  Examining  Figure  B shows  that 
the  all-pole  spectra  for  LPC  at  25dB  has  less  overall 
energy.  This  is  to  be  expected  since  less  noise  was  added 
to  the  clean  text  to  arrive  at  a 25dB  SNR.  Other  than  the 
gain  difference,  the  spectra  are  approximately  equal. 
Informal  listening  tests  supported  the  results  shown  here, 
in  that  LPC  synthesis  with  a 25dB  input  SNR  was  essentially 
indistinguishable  to  within  a gain  factor  from  SABER 
synthesis  with  a 10  dB  input. 
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Experiments  on  helicopter  Speech 


An  audio  test  tape  used  in  the  National  Security 
Agency's  consortium  testing,  containing  speech  recorded  in  a 
helicopter  environment  was  processed.  The  speech  was 
filtered  at  3.2KHz  and  sampled  at  6.67KHz.  Synthesized 
speech  was  generated  using  LPC  and  using  SABER.  This 
experiment  represents  true  field  conditions  since  the  noise 
and  speech  are  acoustically  added  at  the  microphone.  Figure 
IV. 4 A and  B show  time  and  frequency  functions  for  LPC  and 
SABER  syntheses  during  noise  only  input.  Figure  A shows 
synthetic  waveforms  using  the  same  vertical  scale.  The 
average  energy  difference  was  measured  at  15.1dB.  Figure  B 
shows  the  all-pole  spectra.  The  periodic  harmonics  at 
multiples  of  750hz  are  clearly  evident  in  LPC  spectrum 
(upper  trace)  with  the  second  and  fourth  harmonics  dominant 
at  about  92dB.  The  all-pole  spectrum  corresponding  to  the 
SABER  spectral  estimate  is  given  in  the  lower  trace. 

Figure  IV. 5 A through  D show  time-frequency  pairs  for 
the  vowel  |u  1 in  "squirrels"  and  the  fricative  |ah  | in 

"bushy".  Note  in  Figure  B that  the  SABER  spectral  estimate 
clearly  separates  the  low  first  and  second  formants  as  well 
as  shapenlng  the  third  formant  at  about  1800  Hz.  Also  the 
noise  peak  at  3KHz  present  in  the  LPC  spectrum  is  now  absent 
in  the  SABER  spectrum. 
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Chapter  V 


Analysis  of  the  Phase  Independent  Model 

Introduction 

Basic  to  the  understanding  of  the  SABER  algorithm  Is 
the  result  that  magnitude  of  the  noisy  speech  spectrum  can 
be  accurately  approximated  by  the  sum  of  the  magnitudes  of 
speech  and  noise.  This  chapter  describes  this  phase 
independent  model  and  develops  an  error  analysis  for  Judging 
the  effectiveness  of  the  approximation.  For  notational 
convenience  upper  case  symbols  will  denote  Fourier 
transforms  and  lower  case  symbols  their  inverse  transforms. 


Thus 


X = Meh  = I x{k)e'j‘^'' 

k=.co 


and 


X = x(k) 
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Additive  Noise  Model  and  Zero  Phase  Approximation 

Assume  that  a noise  signal  n,  has  been  added  to 
speech  signal  s,  with  their  sum  denoted  as  x.  Then 

X = s + n 

Taking  the  Fourier  transform  gives 

X = S + N 


The  desired  speech  spectral  magnitude,  |S|  is  given  by 


|S|  = |X  - N| 


with  its  squared  magnitude 

|S|^  = SS*  = |X|^  + |N|^  - 2|X||N|cos(ej^  - 0,^) 

where  * denotes  complex  conjugate,  the  phase  of  X and  0 
the  phase  of  N. 

The  zero  phase  approximation  to  |S|  is  given  by 

= |X|  - |N| 


with  Its  squared  magnitude 


Error  Analysis  of  Zero  Phase  Approximation 


The  primary  consideration  of  the  speech  analyzer  is  an 

accurate  estimation  of  the  squared  magnitude  function. 

2 2 

Taking  the  difference  D between  |S|  and  |S^|  gives 
D = |S|^  - = 2|X||N|(1  - cos(ej(  - 0^)) 


or 


D = - XN*  - NX*  + 2|X| |N| 


A form  for  D which  explicitly  shows  its  relation  to  S 
and  N can  be  developed  by  noting  that  D can  be  written  in 
the  form  of  a perfect  square. 


Thus 


D = - 


or 


- /fs+N)N*  - /fs^)*N 


After  some  manipulation  D oan  be  written  in  the  following 
form  which  explicitly  shows  its  dependence  on  the 
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signal-to-noiae  ratio  |S/N|  • 


2 

In  this  unfortunately  complicated  form,  the  zero  phase 
approximation  error  D,  can  be  analyzed  as  a function  of  the 
signal -to-noise  ratio. 

Extreme  Error  Values 

Again  remember  that  all  symbols  are  functions  of  radian 
frequency  m.  The  error  D will  of  course  by  zero  whenever  S 
and  or  N are  zero. 

A worse  case  condition,  that  is  when  D is  maximum,  will 
occur  when 

9„  - 65  . 1 ./2 

Then 


i 


D = - 


/ 1 - JIS/nT  - /T  + MS/H\ 


D = 


/ 


|S/N|e  5 ^ M 


/lS/N|e  ^ + 1 
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or  after  squaring 


For 


D = 2|Nr 


/ IS/Nl' 


+ 1 


|S/N|  « 1 D % 0 

For 

|S/N|  » 1 D ^ 2|N|lS| 

The  relative  error  is 

D/lS|^  = 2|N|/|S|  « 1 


Finally,  for  |S|  ^ |N| 


D 


2|Nr 


■/7=  - 1' 


This  situation  is  depicted  graphically  as 


i 


Here  the  relative  error  is  >tiven  by 


D/|S|^ 


2 


Jl 


-6dB. 


Summary 

This  chapter  investigated  the  spectral  error  induced  by 
usinr  a zero  phase  approximation  to  the  mafrnitude  spectrum. 
The  worse  case  condition  occurs  when  the  magnitudes  of  S and 
N are  equal  and  out  of  phase  by  ninety  degrees.  For  other 
situations  the  error  was  negligible  or  small  compared  to  the 
speech  spectrum.  Informal  listening  tests  .judged  the  LPC 
synthesis  based  upon  the  zero  phase  power  spectrum  to  be 
essentially  Indistinguishable  from  LPC  synthesis  based  upon 
the  actual  power  spectrum.  Although  differences  can  be 
detected,  they  become  inconsequential  compared  to  the  noise 
cancellation  capabilities  provided  by  using  the  zero  phase 
approximation.  The  application  of  the  zero  p'qase  model  to 
noise  suppression  is  developed  in  the  next  chapter. 


Chapter 


VI 


Analysis  and  Reduction  of  Estimation 
Error  using  the  SABER  Method 

Introduction 

Using  the  result  of  Section  V that  the  magnitude  of 
speech  plus  noise  can  be  accurately  approximated  by  the  sum 
of  the  zero  phase  estimate  of  speech  and  noise  magnitude, 
the  following  linear  model  is  implied: 

|X|  = + |N| 


where 

S^  = zero  phase  approximation  to  the  magnitude  of  the 
Fourier  transform  of  the  windowed  speech,  s(k) 

|n)  = Magnitude  of  the  Fourier  Transform  of  the  Additive 

windowed  noise  n(k). 

and 

|x|  = Magnitude  of  the  spectrum  of  s(k)  + n(k) 

Again  upper  case  symbols  denote  Fourier  transforms  of  lower 
case  symbols.  This  model  shows  that  the  noise  spectrum,  |n| 
enters  in  as  a spectral  bias  added  to  the  desired  speech 
spectrum,  82.  The  more  accurate  this  bias  can  be  estimated, 
the  more  accurate  will  be  the  resulting  estimated  spectrum 
obtained  by  subtracting  the  bias  estimator  from  the  noisy 
speech  magnitude,  | X |.  This  section  describes  an  approach 


29  - 


to  bias  estimation  and  removal  using  short  time  averaging. 


The  Non-Averaged  SABER  Estimate 

Assume  that  the  additive  noise  n(k)  Is  stationary  and 
thus  that  the  expected  value  of  the  noise  magnitude: 

E{|N|}  = 

Is  constant  over  time.  (In  practice  Is  estimated  by  a 
time  average  taken  during  non-speech  activity.) 

A spectral  estimate  S/^  of  can  be  defined  as 

Sa  = |X|  - 

The  error  r.  in  approximat  ing  by  Sa  is 

e = Sa  - = |X|  - - |X|  + |N| 

or 

c - |N|  - 

Using  this  estimate  the  value  of  the  error  equals  the 
difference  between  the  magnitude  of  the  noise  spectrum  and 


its  expected  value. 


Reduction  in  Error  Through  Averaging: 

The  SABER  Spectral  Estimate 

A straight  forward  method  for  reducing  the  spectral 
error  f is  through  averaging.  Averaging  magnitude  spectra 
|x(l)|  taken  from  possibly  over  lapping  time  windows  gives 

Tx|  = I I |XH)I  = J-  I s,(i)  ♦ |N(I)I 

i = l i = l 

or 


ixl  = + 1N| 


The  SABER  spectral  estimate  is  given  by: 

i,  = Tm  - 


The  spectral  error  c 


in  approximating  S^  by  S^  is 


^ ■ l'|ij  - |X|  + |n| 


or 


f ■ INI  - u. 
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Assuming  first  that  the  zero  phase  approximation  is 

an  accurate  approximation  of  | S | as  argued  in  Chapter  V and 
second  that  during  the  total  time  segment  over  which  the 
averages  are  taken  that  the  speech  spectra  S^Ci)  remain 
essentially  constant,  then 

|s|  ^ 

Thus  the  averaged  spectral  magnitude  error  ^ 

represents 

c - I N I ■ ^ 

Assuming  stationarity  this  shows  that  as  more 

time-averaged  spectra  are  used,  the  sample  mean  ]»J  will 

converge  to  and  will  converge  to  |S|  . Unfortunately 

the  nonstationarlty  of  the  speech  limits  the  time  interval 

over  which  averages  can  be  taken.  Thus  the  error  can  only 

be  reduced  to  the  extent  to  which  the  sample  mean  |N|  has 

converged  to  the  mean  y . 

N 

In  terms  of  mean  squared  error,  the  variance  of  the 

error 


0^  = var(c)  = Ef(S^  - S^)^) 

equal.-  ♦he  variance  of  the  sample  mean  of  the  magnitude 
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noise  spectrun: 


= VAR(|N|)  = E{(|N|  - 
|N| 

The  error  will  be  minimized  to  the  same  extent  that  the 
variance  of  the  sample  mean  has  been  minimized. 

In  summary,  the  noise-suppression,  signal  estimation 
problem,  using  the  zero  phase  approximation  linear  model, 
can  be  reduced  to  a spectral  estimation  problem. 
Unfortunately  due  to  the  nonstatlonarity  of  the  speech,  only 
a limited  number  of  spectral  time  averages  can  be  used  to 
minimize  the  variance  of  the  sample  mean.  Thus  complete 
noise  cancellation  is  not  possible.  At  this  point 
Justification  for  the  name  SABER  is  apparent.  The  noise 
spectrum  shows  up  as  a bias  on  the  desired  signal.  To 
remove  the  bias,  averages  are  taken  for  as  long  as  the 
underlying  speech  remains  stationary.  Averaging  will  reduce 
the  variance  of  the  local  noise  bias.  The  bias  can  be 
partially  removed  by  subtracting  off  the  expected  value  of 
noise  bias.  The  smaller  the  variance,  the  better  the  noise 
reduction . 
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Expected  Value  of  Noise  Reduction 


When  the  additive  noise  n(k)  is  zero  mean,  white,  and 
Gaussian,  an  estimate  of  the  expected  value  of  noise 
reduction  can  be  computed.  In  turn  the  amount  of  noise 
reduction  can  be  equated  to  the  variance  of  the  error 
between  the  SABER  estimate,  and  the  speech  spectrum. 
Specifically,  from  the  last  chapter  let 


Define  as  the  variance  or  average  power  of  the 
additive  noise.  Then 

% = E{n^(k)} 


Using  the  results  of  Appendix  B,  the  expected  value  of  the 
noise  magnitude,  , having  a x distribution  of  order  two 
will  equal: 


Therefore  ' hf  varian-e  of  the  magnitude  will  equal 


n(!N|  - - t(|N|^l  - 11^  - (/(I  - J)L 

Without  averaging,  by  simply  subtracting  the  mean  from 
the  magnitude  spectrum,  the  expected  value  of  noise 
reduction  will  equal  6.68dB: 


10  log^Q  = 6.68  dB 

E{(|N|  - 


Further  Reduction  in  Variance  Through  Averaging 

The  noise  variance  can  be  reduced  by  averaging 
magnitude  spectra  taken  from  possibly  overlapping  time 
windows.  This  technique  has  been  carefully  Investigated  by 
Welch  [5].  A summary  of  his  variance  analysis  is  described 
in  Appendix  A.  Again  because  of  the  nonstationarity  of  the 
speech  spectra,  only  a limited  time  interval  is  available 
for  averaging.  Using  Welch's  formulation,  the  variance  is 
minimized  by  first  specifying  the  type  and  length  of  window 
required,  second  the  time  interval  available  for  averaging, 
and  then  averaging  magnitude  spectra  based  upon  windowed 
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data  overlapped  by  one-half  a window  length 


For  this 


Implementation,  a 19.2  ns  Hamming  window  was  used  with  a 
time  interval  of  57.6  ms.  This  allowed  for  a total  of  five 
windows  to  be  used.  The  variance  of  the  sample  mean  after 
averaging  equaled 

E{(fNT  - = 0.?75  Van  |N|>  - (0.  ?75)  ( . 21  )c.^L 

The  expected  value  of  noise  reduction  is 


Discussion 

The  choice  of  window  length  and  averaging  interval 
values  represent  a compronlse  in  conflicting  requirements. 
For  acceptable  spectral  resolution  a window  length  greater 
than  twice  the  expected  largest  pitch  period  is  required 
[1].  For  minimum  variance  a large  number  of  windows  are 
required  for  averaging.  Finally,  for  acceptable  time 
resolution  a narrow  analysis  interval  is  required.  Using  a 
Hamming  window  the  effective  averaging  time  interval  reduces 
from  57.6  ms  to  approximately  39  ms  due  to  the  window  edge 
attenuation.  k 19.2  ms  window  length  has  been  found  to 
result  in  acceptable  frequency  resolution  [6].  Thus  to 

i 

\ 

1 
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achieve  better  noise  reduction  some  conpronlse  in  both  time 
and  frequency  resolution  was  necessary. 


Negative  Magnitude  Stripping 


An  additional  technique  for  reducing  the  spectral  error 
is  to  replace  with  zero  when  ever  the  difference  goes 

negative.  During  non-speech  activity  this  will  on  the 
average  reduce  the  resulting  noise  power  in  half  for  an 
additional  3dB  improvement.  When  speech  is  present  and 
is  larger  then  [Xf  the  spectral  error  between  S^  and  will 
be  larger  than  if  zero  is  used  for  the  estimate.  That  Is 


|X|  < 


-Sa>0 


(Sz  - > (S^  - 0)^ 


I 


^7  - 


Summary 


This  section  developed  an  equality  between  the  error  In 
the  SABER  spectral  estimate  and  the  variance  of  the  sample 
mean  of  the  additive  noise  spectral  magnitude.  For  additive 
noise  which  Is  white,  zero  mean,  and  Gaussian,  the  expected 
value  of  noise  reduction  resulting  from  averaging, 
subtracting  off  the  mean  and  zeroing  out  negative  difference 
components  was  approximately  15d6. 

I 


V 
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Appendix  A 


Variance  Reduction  Through  Averaging  Magnitude  Spectra 

Introduction 

It  was  shown  in  Chapters  V and  VI  that  the  noise 
spectrum  shows  up  as  a bias  on  the  desired  signal.  To 
remove  -the  bias,  averages  are  taken  for  as  long  as  the 
underlying  speech  remains  stationary.  Averaging  will  reduce 
the  variance  of  the  local  noise  bias.  The  bias  can  be 
partially  removed  by  subtracting  off  the  expected  value  of 
noise  bias.  The  smaller  the  variance,  the  better  the  noise 
reduction.  Due  to  the  nonstationarlty  of  the  speech  only  a 
limited  time  Interval  Is  available  for  averaging.  This 
appendix  reviews  an  analysis  published  by  Welch  [5]  for 
determining  the  variance  reduction  possible  as  a function  of 
averaging  Interval,  M,  window  length,  L,  window  shape 
w(J),  J = 0,1,  ...,  L - 1,  and  overlap  interval  D. 

Using  the  results  of  this  analysis  the  amount  of 
variance  reduction  was  calculated. 

Data  Segmentation 

Define  x(J)  J s 0,  1,...,  M - 1 to  be  samples  of  a 
second  order,  stationary  stochastic  sequence.  Data  segments 
x^(J),  possibly  overlapping  of  length  L are  defined  with 
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starting  points  D units  apart.  Thus  define 


X J ) = X ( j ) 

X2(J)=x(J-i.D)  J = 0,  1 L-1 

x,^(J)  = x(J  ♦ (K  - 1)D) 

Assume  there  are  K segments  covering  the  entire  averaging 
interval : 


(K  - 1)D  + L = M 


Magnitude  Spectrum  Calculation  and  Averaging 

Using  ^ data  window,  w(J)  (for  this  analysis  a Hamming 
window  was  selected),  the  windowed  data  sequences  are 
formed  ; 


{x(  J )w(  J ) } , . . . {X|^(  J)w(  J)  } 

For  each  windowed  data  sequence,  the  discrete  Fourier 
transform  is  calculated: 


L-1  -j  -f-  iH 

X.(0  = I X.  (i)w(1)e  *■  £ = 0.  1 L-1 

* 1 =0 
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followed  by  the  magnitude  spectrum: 


Averaging  the  spectral  estimates  gives 


\m\  = { I ix  («)i 

^ k=l 


Variance  of  |X| 


Define  the  covariance  of  |X(  as 


d(j)  « cov{  |X|^(«,)  I , |X,^^j{^)n 


Then  It  can  be  shown  that 


K-1 


var{|X()l)l}  = ^^(0)  + 2 d(j); 


J 


Further,  defining  the  correlation  of  |X|  as 


p(j)  " corre1ation{|X|^((l)|,  |X,^^j(0|}  = 


then 


var{  |X(J»)  I ) = 


var{X^(fc)} 

K 


I K-l 
1+21 
J = 1 


Assuming  that  the  spectrum  of  x(j)  is  flat 
Gaussian,  the  correlation  p(j)  of  |x|  is  given  by 


and 


L-1  L-1  p 

p(j)  = I w{k)w(k  + jb)/  I w‘^(k) 
k=0  k=0 


Minimum  Variance  Determination 


For  a fixed  averaging  interval,  M,  window  length,  L, 
and  window  shape  w(J),  the  optimal  spacing  D can  be 
calculated  to  minimize  the  expression: 
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< 


F 


1 

K 


K-1 

1 + 2 y 

j = l 


Note  that  p( j ) is  greater  than  or  equal  to  zero.  For 
D ' L (no  overlap)  p(  J ) = 0 and  the  variance  will  reduce  as 
1/K.  By  overlapping  segments,  K increases  at  the  expense  of 
an  increasing  p(J).  After  some  calculation  the  minimum  as 
suggested  by  Welch,  was  found  to  occur  for  D = L/2,  that  is, 
overlap  by  one-half  a window  length. 

When  using  the  following  parameters  imposed  by  the 
speech  analysis  constraints: 

M = 384 
L = 128 
D = 64 
K = 5 

the  resulting  value  for  F was  calculated  to  be  0.275. 
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Appendix  B 


Calculation  of  the  Mean  and  Variance 
Introduction 

This  appendix  develops  estimates  for  the  mean  and  variance 
of  the  magnitude  spectrum  for  white,  zero-mean  Gaussian 
noise.  Of  course,  this  development  is  not  original,  with 
this  version  taken  in  part  from  Ingebretsen  [?]• 


Mean  and  Variance  of  the  Fourier  Transform 

Let  x(k)  be  a zero  mean,  white  Gaussian  sequence  with 
X(e'^‘^)  its  Fourier  transform.  Then 


X(cJ'^)  - Xp{eJ'‘’)  + jXj(eJ"’)  = );x(k)e* 

k 


where  Xp  and  Xj  are  the  real  and  imaginary  parts  of  X. 
X^and  Xj  will  be  normal  since  x(k}  is  normal.  In  addition 


EfX^}  = EfXj}  = 0 


It  can  be  shown  [8]  that 


•Vo, 

cov{Xj^(e  ' 


JU)y 

).  X^(e 


)}  = covfXj(e 


jw- 

)•  X,(e  ^)|  r 0 


for  / (A)^ 

J‘**1  ji'ij 

Thus  the  vector  pairs  (X  (e  ),  X-Ce  ^ ))  and 
(Xj(e  ),  Xj  (e  ^))  are  Independent  since  X is  Gaussian. 
Likewise,  X^^  and  Xj  are  independent  of  each  other. 

Using  the  indapendence  of  x{k),  the  variance  of  the 
real  and  imaginary  part  of  X(eJ“)  is  given  by 

var{Xj^(eJ“)}  = E{^x^(k)cos^(ka,)) 

or 

varfXj^(eJ“)}  = o^ycos^(ka)) 


where 


E{x^(k)}  = 0^ 


Assuming  the  transform  is  taken  using  a window  of 
length  L samples  then 
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2 I 

J^COS  (k(i))  = 5 + 

k 


1 SIN  Lo) 

2 irriii 


Evaluating  the  transform  at  equally  spaced  frequencies 
results  in: 


varlXj^CeJ'^)}  = ^ 


k27r 


k = 0,  I, 


1. 


Likewise 


varlXjle'^"’))  = 


T 


kZv 
= 1" 


k = 0,  1, 


1. 
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Expected  Value  of  Magnitude  and 


Squared  Magnitude  of  the  Fourier  Transform 


The  frequency  magnitude  is  equal  to  the  square  root  of 
the  sum  of  squares  of  two  Independent,  normal  random 
variables.  It  thus  has  a x distribution: 


IX|  = (Xp  + -0  = -L"  k = 0,  1 L - 1 


The  expected  value  of  |x  | is  found  by  evaluating  the 
integral : 


E{|X|}  = A 


La 

r 


x2e-xVLa 


2 

dx 


Evaluating  gives 


E{  |X(e'^'" 


k = 0,  1 L - 1 
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distribution 


Its 


The  squared  magnitude  will  have  a x 
expected  value  is  given  by 


E{|X1^}  = -L 


Evaluating  gives: 


E(  (X(e'^‘'’)  1^1  = La^  ~ k = 0,  1 , . . . , L - 1 

This  last  expression  can  also  be  obtained 
Parseval's  relation  for  the  Discrete  Fourier  Transform: 


L-1  - ,L-1  » 

I xMk)  = r ):  ix{or 

k=0  ‘"1=0 
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Section  II 


Current  hesults  on  Dual  Input  Nonstationary 


Noise  Suppression  Dsinp:  LMS  Adaptive  Noise  Cancellation 


Dennis  Pulsipher 


INTRODUCTION 


1.  The  precedinp:  semi-annual  technical  report 
described  a dual  input  noise  suppression  technique  for  audio 
signals.  It  also  described  several  experiments  performed 
successfully  using  synthetic  data,  data  which  was  forced  to 
comply  with  all  of  the  underlying  assumptions.  The  results 
of  these  experiments  were  quite  encouraging. 

Since  that  time  the  research  has  continued  to  apply 
this  technique  to  actual  acoustically  recorded  noisy  speech. 
A brief  description  of  recent  results  and  conclusions 
comprise  this  section. 


OBSERVATIONS 

As  work  with  acoustically  recorded  data  with  high  level 
broad-band  noise  has  progressed  it  has  become  increasingly 
apparent  that  a factor  of  major  importance  is  the  length  of 
the  required  filter.  A look  at  the  impulse  responses  of 
typical  hard-walled  rooms  indicates  that  the  duration  of 
significant  energy  frequently  lasts  half  a second  or  more. 
(Figure  la).  Since  in  practice  the  source  of  the  noise  n is 
separated  from  both  the  location  of  the  reference  noise 
pick-up  and  the  noisy  speech  signal  pick-up,  it  is  apparent 
that  the  situation  is  not  completely  described  by  a single 
room  impulse  response. 


- 51  - 


TOP: 

BOTTOM: 


-4  ssec-i 


-I  0«7E-1 


-4  500C-1 


4 5«ec-l 


Typical  Impulse  Response  of  a Room  (6{T)) 

Typical  Noise  Cancelling  Filter  for  the  Same  Room  (H(T)) 


Comparison  of  a Typical  G(T)  and  H(T) 


Let  G j be  the  response  of  the  room  from  the  noise 
source  to  the  noisy  speech  signal  pick-up  position  and  G2  he 
the  room's  response  from  the  noise  source  to  the  reference 
noise  pick  up  position.  Then  the  filter  to  be  estimated  is 
neither  Gj  nor  G2  but  G2  Gj^.  (Figure  2).  This  filter  may 
be  as  long  as  the  sum  of  the  lengths  of  G^  and  G2(less  one 
point ) . (Figure  1 b ) 


This  affects  significantly  the  estimation  of  the  noise 


samples  Uj  . 


The  optimal  noise  estimate  Uj  is 


h.  = I (j-i)v(i) 

'>  i=oo 


In  actual  practice,  though,  the  filter  estimated  is 

m - 

u.  : 

■^1=0 

The  error  in  this  estimate  is  due  to  two  factors,  the 
first  is  the  difference  caused  by  a finite  length  filter. 
This  error  can  be  reduced  by  making  the  filter  longer.  The 
second  factor  is  the  error  in  filter  coefficient  estimation, 
by  increasing  the  number  of  filter  taps,  the  variance  on  our 
noise  estimate  may  in  fact  increase,  and  thus  degrade  our 
system  performance.  That  is,  if  we  increase  the  number  of 
filter  coefficients  we  must  also  increase  the  accuracy  of 
estimation  of  the  coefficients  or  the  improvement  in 
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V 


Figure  2 

Data  generation  model 


/ 


/ 


performance  caused  by  lenRtheninp  the  filter  may  be 


outweighed  by  the  degradation  caused  by  m ir.  s -er  t i ma  t i on  of 
coefficients  . 

This  increased  accuracy  required  by  longer  filter 
lengths  may  be  achieved  by  decreasing  the  rate  of 
adaptation. 

Since  estimating  long  filters  reouires  more  computation 
and  since  a slow  adaptation  rate  requires  processing  more 
data,  it  has  been  found  that  performing  such  experiments  is 
a time  consuming  process  using  the  non-real  time  simulation. 
Present  results  indicate,  that  for  broadband  noise,  a 
minimum  cf  ICdb  noise  reduction  using  a .4  sec.  adaption 
filter  is  possible.  For  highly  correlated  noise  much 
shorter  filters  (on  the  order  of  100  taps)  can  be 
successfully  used  giving  considerably  better  noise 
rejection . 

Additional  experiments  with  longer  filters  and  slow 
adaptation  rates  are  currently  being  performed.  Effort  is 
also  being  put  into  determining  the  optimal  filter  length 
and  adaptation  rates  for  different  types  of  noise  and 
different  channels,  and  in  the  actual  helicopter  operatinr 
environment . 


it 
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SFCTION  III 


ESTIMATION  OF  THE  PARAMETERS  OF  AN 
AUTOREGRESSIVE-MOVING  AVERAGE  PROCESS 
IN  THE  PRESENCE  OF  NOISE 


William  J.  Done 


I 

i 

•« 
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INTRODUCTION 


Two  important  aspects  of  vocoder  development  have  been 
the  achievement  of  hi^h  quality  synthetic  speech  and  the 
development  of  low  bit  rate  communications  systems.  Linear 
prediction  (LP)  vocoders  have  gained  importance  in  both  of 
these  areas.  However,  evaluations  of  the  quality  and 
intelligibility  of  LP  and  other  vocoders  are  usually 
performed  with  high  quality  speech  inputs  undegraded  by 
background  noise.  When  noise  is  added  to  the  speech  signal 
prior  to  analysis,  the  intelligibility  and  quality  of  the 
synthetic  speech  derived  from  the  vocoders  are  degraded,  the 
results  often  being  unacceptable.  In  LP  vocoders,  the 
addition  of  noise  causes  problems  in  four  areas:  1)  silence 
detection,  2)  voiced/unvoiced  determination,  3)  pitch  period 
calculation  if  voiced,  and  U)  spectral  matching  errors. 
McAulay  [10]  has  addressed  problems  1),  2),  end  3)* 
Spectral  matching  errors,  which  result  from  the  Inaccurate 
identification  of  the  linear  prediction  parameters 
(autoregressive  parameters)  due  to  the  effects  of  noise, 
will  be  the  primary  emphasis  of  this  research.  Results 
illustrating  the  spectral  degradation  due  to  additive  white 
noise  will  be  presented. 

Given  a time  series  that  can  be  successfully  modeled  as 
a parametric  process,  such  as  an  autoregressive-moving 
average  (ARMA)  process,  there  are  primarily  two  approaches 
that  can  be  taken  to  alleviate  the  distortion  due  to 

- 55  - 


spectral  matching  errors  caused  by  noisy  data.  The  most 
often  used  technique  is  to  suppress  the  noise  prior  to 
analysis,  the  analysis  methods  depending  upon  the  parametric 
model  in  use.  This  prefiltering  technique  is  thus  seen  as  a 
two  step  process:  noise  removal  followed  by  parameter 
analysis.  Usually  the  parameter  analysis  step  remains 
unchanged  from  the  noiseless  case.  Examples  of  prefiltering 
include  the  various  applications  of  Wiener  filtering, 
adaptive  noise  cancelling  techniques,  or  linear  filtering 
techniques  used  to  eliminate  frequency  bands  dominated  by 
noise  . 

The  second  approach  to  parameter  estimation  in  the 
presence  of  noise  involves  the  construction  of  a new  model 
which  explicitly  accounts  for  the  effects  of  the  noise. 
Rather  than  the  two  stage  technique  of  noise 
suppression--parameter  extraction,  the  modeling  approach 
requires  a one  step  system  in  which  the  analysis  methods  for 
parameter  extraction  are  significantly  changed  from  the 
analysis  methods  used  when  no  noise  is  present.  The 
modifications  are  required  to  account  for  the  changes  in  the 
parametric  model  as  a consequence  of  adding  the  noise.  In 
the  case  of  an  autoregressive  (AR)  process  corrupted  by 
additive  Gaussian  white  noise,  the  modeling  approach  for 
parameter  estimation  is  especially  appealing. 
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both  approaches  have  advantages  and  disadvantages.  If 
a model  is  successful  with  high  quality  signal  inputs  and 
algorithms  for  parametric  estimation  are  well  established, 
then  prefiltering  may  be  desirable  when  noise  is  present. 
It  offers  the  advantage  of  retaining  the  original  parameter 
estimation  algorithm  with  little  or  no  alteration, 
r re f i 1 t er ing , however,  may  have  a side  effect  of  distorting 
the  characteristics  of  the  desired  signal  in  the  process  of 
suppressing  the  noise.  An  example  of  this  could  be  low  pass 
filtering  of  the  noisy  data  in  a situation  where  the  noise 
becomes  dominant  above  some  frequency  fL.  Although  the 
filtered  signal  may  have  an  improved  signal-to-noise  ratio 
(ShR),  some  information  contained  in  the  signal  is  lost. 
Another  possible  disadvantage  of  the  prefiltering  approach 
arises  when  the  system  is  to  be  used  in  an  environment  where 
the  noise  characteristics  are  changing,  here  decisions  must 
be  made  about  the  structure  of  the  prefilter:  should 
several  types  of  filters,  each  matched  to  a type  of  noise, 
be  used  or  should  an  adaptive  filter  structure  be  used. 

If  the  time  series  to  be  characterized  is  successfully 
modeled  by  a parametric  representation  like  the  AR  model, 
then  the  modeling  approach  to  parameter  extraction  may  be 
desirable  if  the  effects  of  the  noise  can  be  Included  in  a 
modified  model.  The  primary  disadvantage  of  the  modeling 
approach  is  the  need  to  change  the  parameter  estimation 
procedure,  requiring  the  development  of  new  algorithms.  For 
the  slgnal-ln-noise  model  discussed  here,  the  addition  of 
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noise  to  an  AR  process  results  in  an  ARMA  process.  Because 
of  the  nonlinear  relationsips  involved,  the  task  of 
identifying  the  parameters  of  an  ARMA  process  is  much  more 
difficult  than  identifying  the  parameters  of  an  AR  process. 
In  addition,  there  is  the  validity  of  the  parametric  model 
for  the  time  series  being  considered,  which  may  pose  a 
limitation  on  the  modeling  technique.  This  approach  also 
shares  the  disadvantage  that  occurs  in  nonstationary  noise 
situations  . 

This  section  will  describe  a technique  in  which  an 

autoregressive  process  of  order  q,  AR(q),  with  AR  parameters 
q 

{a(i))i  is  Identified  when  corrupted  by  additive  Gaussian 
white  noise.  It  will  be  shown  that  the  additive  noise 
changes  the  time  series  from  an  AR  process  to  an  ARMA 
process,  from  which  the  q original  AR  parameters  can  be 
identified.  Preliminary  results  are  given  on  the  initial 
efforts  to  implement  the  algorithms  necessary  for 
determination  of  the  desired  AR  coefficients. 

As  stated  above,  one  limitation  of  the  modeling 
approach  is  the  validity  of  the  parametric  model,  an  AR 
process  in  this  case.  Voiced  speech  has  been  successfully 
"modeled"  by  an  all-pole  LPC  process.  However,  if  voiced 
speech  is  assumed  to  be  an  AR  process  and  identification  of 
the  AR  parameters  is  desired,  the  periodic  or  seasonal 
component  can  hinder  the  "identification"  of  the  AR 
parameters.  The  parametric  model  must  be  modified  to 
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account  for  the  presence  of  the  periodic  component, 
introducing  the  concept  of  autoregressive-  integrated  moving 


average  (ARIMA)  processes.  Extensions  of  the  parametric 
model  for  the  signal  to  include  ARI,  ARMA,  and  ARIMA  forms 
will  be  considered.  The  ARI  model  for  voiced  speech 
waveforms  will  be  emphasized.  This  model  will  be  studied  in 
both  noiseless  and  noisy  conditions. 

SYSTEM  DESCRIPTION 

In  this  section,  the  procedures  for  parameter 
estimation  in  the  presence  of  noise  will  be  discussed.  The 
effects  of  additive  noise  on  LPC  parameters  will  be 
described  and  a detailed  presentation  of  the  mathematics 
defining  the  ARMA  model  will  be  given.  Of  the  four  parts  to 
follow,  part  one  will  present  the  algorithm  for  linear 
predictive  coding.  The  effects  of  additive  noise  upon  the 
LPC  parameters  can  then  be  observed.  The  popular  LPC 
approach  can  also  be  compared  with  the  ARMA  model  technique. 

Part  two  of  the  system  description  will  present  the 
ARMA  model  algorithm  suggested  by  Pagano  [11].  The 
mathematical  details  of  this  technique  are  given.  In 
developing  the  ARMA  model  approach,  the  initial  step  is  a 
stage  for  estimating  the  ARMA  parameters.  Once  these 
estimates  are  available,  a nonlinear  regression  modifies  the 
Initial  estimates.  Part  three  will  discuss  possible  ARMA 
estimation  procedures  and  the  nonlinear  regression 


59 


technique.  The  last  part  will  Include  a discussion  of 
important  software  not  Included  directly  in  the  operation  of 
the  LPC  or  ARKA  model  analysis  systems.  Also  discussed  in 
this  last  part  is  the  data  base  used  for  analysis. 

Linear  Predictive  Coding 

If  s(k)  is  a time  series  which  can  be  modeled  as  a 
qth_order  autoregressive  process,  AR(q),  then 

q 

s(k)  = - );  a,(i)s(k-i ) + £(k),  1) 

i = 1 

q 

where  (ai(i))  ^ are  the  AR  parameters  and  c(k)  is  a zero  mean 
white  noise  process.  In  developing  the  expressions 
characterizing  LPC  and  presenting  only  the  autocorrelation 
method  of  analysis,  the  equations  are  much  more  compact  if 
matrix  notation  is  used.  Refer  to  Makhoul  [9]  for 
additional  background  and  a list  of  references  for  LPC 
development.  The  development  of  a notatlonal  convention  for 
LPC  using  a matrix  formulation  can  be  found  in  Boll  [2]. 

Using  the  autocorrelation  method,  the  sequence  s(k)  has 
infinite  extent  but  is  nonzero  only  for  0 < k < N-1,  where  N 
is  the  size  of  the  analysis  window.  Form  the  (N-fq)  x 1 
vector  s,  where 

1 - [ s(0)  s(l)  ...  s(N-l)  0 ...  0 ]^.  2) 

using  0 as  a delay  operator  for  vector  notation,  D^s  is  fin 
(N-fq)  X 1 vector  with  the  sequence  s(0)  ...  s(N-1)  beginning 
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at  the  (l-fHth  position.  The  superscript  1 can  take  on  the 


values  1,  q.  For  example, 

= [0  s(0)  s(l)  ...  s(N-l)  0 ...  0]^, 
d\  = [0  0 s(0)  s(l)  ...  s(N-1)  0 ...  0]^,  and 
D%  = [0  0 ...  0 s(0)  s(l)  ...  s(N-l)]^. 


Form  the  ( N-f q ) x q matrix  by  appending  as  columns  the 
D^s,  1 = 1,  ...»  q 

H = [D^S  D^s  D^s  ...D'^S  ].  3) 

— s — — 

If  an  error  sequence  £ is  defined  as 

c = [ e(0)  t(l)  ...c(N-l)  t(N)  ...e(N+q-l)  f,  4) 

then  1)  can  be  written  as 


5) 


where  “ C *^(1)  3^(2)  ...  ai(q)  is  formed  from  the 
prediction  coefficients  and  the  index  k in  1)  is  confined  to 
the  interval  0 £ k £ N-«-q-1.  In  LPC  the  measure  of  closeness 
of  fit  is  the  least  squares  minimization  of  the  energy  in 
the  error  signal  £,  as  a function  of  the  {a^(i}}p  If  the 
loss  function  Le  is  defined  as 
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N+q-l- 

= I e^(k) 

k»0 


T 

e e. 


6) 


then  the  minimum  of  with  respect  to  the  {a^(i)}^  is 

found.  Using  vector  calculus, 


. 9 To''’  1 - 9 T 9e 
^ ^ Li  £]  = 2t  - 


- 9£^ 


The  minimum  of  is  obtained  by  setting  this  expression 
equal  to  zero, 


H,  - 0. 


From  5),  |^  = H e^H  = 0 > o'' 

9a^  -S  S 


H^e  = 0. 


7) 


Substituting  5)  into  7)  gives 


h!  S + a 

-s  - -S  -S  — I 


= 0 


or 


h'  H a * -H  S 
s s 1 ^ 


8) 


1 
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Note  that  the  matrix  and  the  vector  are  equal  to 


r 


H = 

-s  -s 


Rs,(0)  R„(l) 


%s<'>  "ssO” 


»ss 


"ss  '’> 

R„  (2) 


‘-R55  (q) 


"ss 


R„  (q-2)  i 


• “ss'") 


9) 


N-1-|k| 

.here  R^^tk)  » J s(t)s(H|k|) 

matrix  equation  representation 

express  ions 


Equation  8}  is  a 
for  the  Yule-Walker 


J a,(l)R„(1-k)  . -R„(k). 


i-l 


k - If  • • • « q • 


10) 


If  the  sequence  s(k)  is  contaminated  by  additive  noise  to 
produce  the  series  x(k) 


x(k)  + s(k)  + n(k)  , 


11) 
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and  an  AR(q)  model  is  forced  on  the  noisy  data,  similar 


results  are  obtained.  The  AR  model  forced  on  the  noisy  data 
i s 


x(k)  . . I 

i = l 


ap(i)x(l(-i)  + f(k) 


12) 


where  the  { a 2( i ) ) ^ are  the  prediction  coefficients  and  t(k) 
is  the  resulting  error  sequence.  If  the  matrix  Jl^  and  the 
vector  X are  formed  from  the  data  x(0),  ...»  x(N-1)  in  a 
manner  similar  to  iij  and  then  the  loss  function  for 
the  noisy  data  is 


k=0 


T 

t £» 


13) 


with  [«{0)  c(l)  ...  e(N+q-l)]^  Minimizing  Wi.th 

respect  to  the  ja^Cl))^  results  in 


a,  = -H^x 
-x-x-2 


14) 


as  the  expression  defining  the  least  squares  estimate  for 
the  {82(1))^  defined  in  12).  The  elements  of  the  matrix 
and  the  vector  H^Jx  are  formed  from  the  autocorrelation 

function  of  x(k)  as  in  9)  with  s(k)  replaced  by  x(k).  The 
q 

(a^(i))‘l  represent  the  LPC  coefficients  determined  from  the 
undegraded  signal,  while  the  |a2(i))^  are  the  LPC  parameters 
obtained  from  noisy  data,  with  no  attempt  made  to  eliminate 
the  effects  of  noise. 
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Constructing  the  matrix  and  the  vector  n.  from  the 
additive  noise  sequence  n(k),  the  following  relationships 
hold  : 


15) 


mIh  = + hLl  + hJil  + Hi- 

— X— X —s—s  — n— ti  — s— fi  — n— s 


16) 


The  H H term  is  a matrix  formed  of  the  autocorrelation 
-n-n  ^ 

terms  of  n(k)  and  the  terms  H H and  h' H „ contain  the 

-s-n  -n-  s 

crosscorrelation  terms  between  n(k)  and  s(k).  If  it  can  be 
assumed  that  s(k)  and  n(k)  are  uncorrelated,  16)  becomes 


«!!!«  = hIh,  * 


17) 


With  11),  15),  and  17)  substituted  into  1M),  we  have 


[hJh 


* hXJ  * I) 


-tali  * ^ 


ala] 


18) 


where  the  assumption  of  uncorrelated  signal  and  noise  is 
used  to  reduce  the  right  hand  side  of  18).  Solving 
equations  8)  and  18)  for  a^^  and  , respectively, 


i,  ■ -[al!i,)-'H;s 

and 


19) 
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The 


define  the  least  squares  estimates  for  and 

vector  a^2  can  be  related  to  a-i  by  pretnultiplying  18)  by 
to  give 


a,  - [H^]-'  Hjn 


or 


aj  = Cl  a,  - [I  . (hJh^)->  h^h^]-'  ||T„ 


CH^Cis  + hJHjI,  - 


21) 


From  21)  it  is  apparent  that  the  addition  of  n(k)  has 


degraded  the  a^  in  two  ways: 

n a bias  term  [hJh  + hIp 

-s-s  -n-n  -fv- 

subtraoted; 


has 


been 


2}  the  relative  magnitudes  of  the  (a^Ci))^  have  been 
changed  due  to  the  matrix  multiplying  effect  of 
the  expression  CUs-s  * — n— ^ — S— S ' 

The  results  of  equations  18)  through  21)  are  valuable  in 
showing  the  distortion  possible  when  noise  is  added  to  a 
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sequence  that  is  to  be  the  input  to  an  LPC  system.  These 
results  are  based  on  the  explicit  assumption  that  s(k)  and 


n ( k ) arc 

uncorrelated 

and  fail  to 

account 

for 

nonzero 

crossccrrelation 

terms 

(the  terms 

1 J U , 
-s-n 

etc 

. ). 

Results 

showing? 

the  distortion 

introduced  by 

n ( k ) 

on 

the 

inverse 

spectrum 

derived 

from 

the  la^  ( i ) }^ 

and 

the 

effects  of 

assumincr 

n ( k ) and 

s ( k ) 

are  uncorrelated  will 

be 

shown 

in  the 

section  on  Preliminary  Results. 


ARMA  Model  Approach 


r 


Inclusion  of  the  effects  of  additive  white  noise  upon 
an  AR(q)  process  is  discused  in  [3].  [11]i  and  [ 1 *»  ] . The 
potential  advantage  of  this  approach  is  to  include  the  noise 
effects  explicitly  in  a more  general  model  than  the  original 
AR(q)  process.  The  model  is  developed  on  the  following 
assumptions : 

1)  s(k)  is  a proper  AR(q)  sequence  described  by 


I a(i)s(k-1)  = e(k) 
i=0 

I 


22) 


for  a(0)  = 1,  a(q)  ^ 0,  and  q ^ 1,  with  '(k) 

independent,  identically  distributed  (l.i.d.) 

2 

N(0,  ) and  S(k)  stationary; 

2)  s(k)  is  contaminated  by  n(k)  to  form  the 
observable  data  x(k), 


x(k)  = s(k)  + n(k), 

where  s(k)  and  n(k)  are  Independent  and 
i.i.d.  N(0,  ). 

q 2 9 

The  model  has  q + 2 parameters  — {a(l)}i,  o , and  of,. 

I E '' 

available  for  analysis  to  determine  estimates 
parameters  is  the  sequence  x(0),  x(N-1). 


23) 

n ( k ) is 

The  data 
of  these 


j 

J 
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Combining  22)  and  23),  we  have 


9 

I a(i)n(k-i)  + G(k),  a(0)  = 1. 

i=0 


24) 


A sequence  y(k)  is  defined  as 


y(k)  = J a(i)x(k-i) 
i=0 


25) 


or 


y(k)  = ^ a(i)n(k-i)  + e(k). 

t=0 


26) 


If  = ^[y(^  shown,  using  26), 

that  Ry,(k)  = 0 for  |k|  > q.  From  26),  y(k)  is  seen  to  be 
stationary.  Combining  this  with  the  property  that 
R^y(k)  = 0,  |k|  > q,  shows  y(k)  to  be  a moving  average 


sequence  MA(p),  with  P 1 q* 


Also 


from 


26)  , 


Ryy(q)  « a^a(q)  ^ 0 » by  the  hypothesis  under  assumption  1) 

above.  As  a result,  y(k)  is  a MA(q)  process,  and  there 

2 

exists  a sequence  of  random  variables  n(k),  i.i.d.  N(0,f’  ) 


and  constants  {b(i)}^  such  that 


y(k)  = 1 b{1)n(k-i).  b(0)  = 1. 

i>0 


27) 
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Combining  25)  and  27)  gives 


? a(i)x(k-i)  = I b(i)n(l(-i),  a(0)  = b(0)  = 1 28) 

i=0  1=0 

and  the  sequence  x(k)  can  be  viewed  as  an  ARtlA(q,q)  process. 

V.hile  the  original  model  has  q + 2 parameters  — {a(i)}^,  ^ 

2 Q Q 

and  op--the  new  model  has  2q4-1  parameters-- { a { i ))  ^ , {b(i))p 

2 

and  o ^ , From  27  ) , 


(k)  - ol 


.-IM 

i=0 


b(i)b(1+k). 


b(0)  = 1. 


29) 


so  the  expanded  parameter  set  could  equivalently  be 
expressed  as  {a{i))?  and  {Ryy(i))^, 

Using  the  definition  for  y(k)  in  26), 


(k) 


o^6(k)  + o 


a(i)a(1+k). 


a(0)  = 1, 


where  {(fc) 


It  k = 0 

0,  k f 0 • 


30) 


Ihus,  the  addition  of  n(k)  to  s(k)  produces  the  following 
relationships  between  the  paramt  ■■  ers  : 

1)  equation  29)  gives  the  autocorrelation  function 
fiyy(k)  for  any  MA(q)  process; 

2)  another  definition  for  kyy(k)  given  in  30)  arises 
as  a result  of  the  noise  model  defined  by  22)  and 
23)  ; 
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3)  the  ARMA  parameters  {a(i)}^  and  {b(i))^  for  the 
process  x(k)  are  related  through  the 
autocorrelation  function  Ryy(k),  the  relationship 
being  expressed  by  29)  and  30). 

A comparison  of  the  ARMA  model  approach  just  described 
with  a forced  LPC  fit  of  the  data,  represented  by  the 
solution  of  20),  shows  two  interesting  facts.  First,  the 
forced  LPC  model,  from  a spectral  point  of  view,  must  match 
the  spectral  characteristics  of  the  input  x(k)  as  closely  as 
possible.  This  spectral  match  includes  those 

characteristics  introduced  by  the  noise.  The  next  section 
will  present  examples  illustrating  the  flattening  effect 
white  noise  has  on  a forced  LPC  fit.  The  second  observation 
involves  the  assumption  of  the  model  form.  If  the  original 
sequence  s(k)  is  AR(q),  then  the  addition  of  white  noise 
results  in  an  ARMA(q,q)  process,  x(k).  This  process  is 
equivalently  an  AR(“>)  process.  The  forced  LPC  fit  is 
actually  representative  of  the  first  step  in  the  process 
discussed  in  [5]  for  estimating  ARMA  parameters,  that  is, 
underfitting  the  AR(«>)  process.  The  ARMA  model  approach  can 
then  be  viewed  as  a procedure  by  which  the  AR(q)  and  MA(q) 
parameters  are  estimated  from  the  AR(«)  parameters. 

Having  identified  the  mathematical  relationships  and 
properties  for  the  ARMA  approach  to  parameter  estimation  in 
the  presence  of  noise,  the  next  phase  describes  the 
procedure  for  estimating  the  AR  parameters  of  the  original 


I 
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signal  s(k).  The  first  step  requires  the  estimation  of  the 

q q 

ARMA  parameters  of  the  expanded  model--{S'(  i ) } i , (b(i)}^, 

_2 

and  --the  symbol  indicating  estimate.  This  represents 

the  identification  of  the  series  x(k)  according  to  28). 
This  set  of  parameters  is  then  transformed  to  the  equivalent 

q — q 

set-- {"aC  1 ) ) 1 and  t ^yy  ( i relationship  defined  in 

29).  At  this  stage  there  are  estimates  for  the  original  AR 

q 

parameters  {a(l)}^.  The  efficiency  of  these  estimates  will 
depend  upon  the  technique  used  to  estimate  the  parameters  of 
an  ARMA  process.  The  estimation  procedure,  however,  has  not 
yet  used  the  information  available  under  assumption  2)  of 
the  model.  This  information  is  carried  in  the  relationship 

q q 

given  by  30)  between  {Ryy(i)}Q,  on  one  hand,  and  {a(i)}i, 

2 2 

o ^ , and  a^,  on  the  other. 

Using  nonlinear  regression  theory,  refinements  are  made 
on  the  estimates  for  the  original  parameter  set  according  to 

z = f(e)  + e, 

where  I = [a(l)  a{2)  ...  a(q)  ^yy(®)  ^yy^^^ 

and  £ “ [ — a(q)  ]^.  The  {iTCi)}?  and 

q G n I 

{a(i))^  in  2 and  G , respectively,  are  estimates  for  the  AR 

q 

parameters.  The  {a  ( i ) ) result  from  the  ARMA  parameter 

estimation  step.  Then  the  nonlinear  regression  stage 

q 

produces  the  {a(i))^.  In  31)  the  nonlinear  functional 
relationship  [_(  • ) represents  that  of  30),  and  e is  a 
(2q4-1)  X 1 vector  reflecting  the  error  between  the  ARMA 


31) 
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parameter  estimates 


found 


in  z and  the  theoretical 


relationship  to  these  estimates  represented  by  30).  f(-) 
maps  from  the  original  q+2  parameter  set  to  the  2q+1 
parameter  set  of  the  expanded  model.  The  nonlinear 
regression  technique  attempts  to  find  the  estimate 
of  £ which  will  minimize  e^  in  the  least  squares  sense.  The 
Gauss-Newton  method  is  proposed  for  this  task  [11]  and  will 
be  discussed  in  the  next  part. 


Algorithms 

In  developing  the  ARMA  approach  in  the  previous  part, 
two  steps  require  major  algorithms:  the  estimation  of  ARMA 
parameters  from  a time  series  and  a nonlinear  regression 
technique.  The  nonlinear  regression  (NLR)  will  be  presented 
first.  Equation  31)  describes  the  nonlinear  relationship 
between  the  parameter  sets  ^ and  0.  The  2q-f1  equations 
comprising  31)  can  be  written  as 


2t  = ®t’  t = 1,  ....  2q+l. 


32) 


The  metric  [8]  for  evaluating  the  effectiveness  of  o in 
minimizing  £ is  given  by 


2q+l 

Using  30)  to  define  the  f^ ( • ) » t = 1, 
following  set  of  equations: 


33) 


, 2q'«-1,  gives  the 
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i = 1,  .... 


q 


a(i)  = a{i)  + e.. 


R (0)  = 

yy 


'\-2 

o 

e 


-^2 


q 

I 

i=0 


'v<2/ . V 

a (i) 


+ e 


q+r 


34) 


V*  *VW,  " 

q -2 

where  a(0)  = 1.  The  {a(i)}i,  o , 
minimize  Q(0 ) , 


1,  •••>  q* 


and  o are 

n 


chosen 


to 


Q(e)  = 


2q+1 


i = l 


35) 


An  iterative  procedure  based  on  the  Gauss-Newton  method  or 
modified  Gauss-  Newton  method  will  yield  a solution  0 to  32) 
having  the  properties  of  convergence  for  a finite  number  of 
functional  relationships  f^  ( • ) i and  the  0 will  be 
asymptotically  efficient  [7].  The  Gauss-Newton  method  is 
based  on  the  linearization  of  the  nonlinear  functions  ft  ( * ) 
about  the  solution  Q. 


The  second  of  the  major  algorithms  to  be  discussed  is 
the  method  for  obtaining  estimates  for  the  ARMA  parameters 
of  a time  series.  The  method  used  will  probably  be  based  on 
one  of  following  techniques:  Hannan  [6];  Graupe,  Krause, 
and  Moore  [5];  or  Steiglltz  [12].  Hannan's  technique  is  a 
three  step  procedure  adaptable  to  iteration  if  desired.  The 
following  is  a brief  summary  of  the  procedure.  Notation  is 
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based  on  that  used  previously  in  this  section. 

1)  Initial  estimates  for  the  AR  parameters.  Uslnj? 
the  property  that  Ryy(k)  = 0 for  | k|  > q for  a 
MA(q)  process  y(k),  estimates  of  the  AR 

q 

coefficients  {aCi)}]  are  found  from  the  solution 
of 


).  a(i)R  (i-k)  = - R^^{k),  k = q+1 2q. 

^ _ 1 A A A A 


with 


N-1- 


= I x(i)x(i+k) 
i=0 

Initial  estimates  tor  the 


k = 0 ?q  . 

MA  parameters . 


Form  R (k)  = );  ? a(i)d(j)R  (k+i-j). 

i=0  j=0 

Note  that  this  is  the  autocorrelation  function 


36) 


of 


v(k)  obtained  by  usinK  25).  If  the  power  spectrum 
estimate  for  y(k),  SyyCw),  is  non-neRative , it  can 

q 

be  factored  to  give  {b(i)}^,  estimates  of  the  MA 
parameters.  This  is  essentially  the 

Autocorrelation  Partial  Realization  method  of 
Atashroo  [ 1 ] . If  Syy(aj)  ^0,  0 < w < tt  • then 

technique  proceeds  to  obtain  the 


Hannan's 

q 

{b(l))]  through  several  intermediate  steps. 

3)  Refine  the  initial  ARMA  estimates.  The  initial 

_ q - R 

estimates  {a(i))i  and  {b(l))^  are  modified  by 
calculating  correction  factors  which  are  added  to 
the  initial  estimates  found  in  step  2)  above.  The 

q 

procedure  is  to  first  modify  the  {a(i)),,  then  the 

q 

MA  parameters  {b(i)}i  , followed  by  a final 
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q 

modification  of  the  {a(i))i.  Step  3)  can  be 
repeated  to  form  an  iterative  approach. 

The  iterative  procedure  of  Step  3)  is  associated  with  the 
solution  of  equations  29)  and  36),  while  the  Gauss-Newton 
NLR  procedure  is  associated  with  the  solution  of  30). 
Hannan's  technique  is  based  upon  Fourier  transformation  of 
the  data  and  carries  a large  computational  burden. 

The  technique  presented  by  Graupe,  et.  al.,  [5],  has 
the  advantage  of  using  only  linear  operations.  The  process 
is  summarized  as  follows: 

1)  Given  an  ARMA(q,p)  process  with  AR  parameters 

P p 

{a(l))^  and  MA  parameters  {b(l)>|  , consider  this 

to  be  an  AR(“‘)  process  with  parameters  (ad)}^. 

From  the  data,  estimate  the  {3(i))^  for  some  large 

integer  value  for  N. 

2)  Using  the  expressions  generated  by  the 

M q 

relationship  between  the  {9(i))i,  {a(i))^  , and 

p 

{b(l))p  a process  is  obtained  for  estimating 

first  the  {b(l))^  and  then  the  {ad))*^.  Both 

1 1 

stages  involve  the  solution  of  a system  of  linear 
equations . 

This  method  is  computationally  more  appealing  than  Hannan's 
method.  However,  the  value  of  N,  which  determines  the  order 
of  the  ad)  approximation,  may  be  rather  large  for  some 
processes  in  which  the  zeros  of  the  MA  filter  are  near  the 
unit  circle. 
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Steiglitz's  [ 12]  method  for  1 doii  1 1 I’y  1 ng  the  AH  and  MA 
parameters  of  a process  is  based  on  the  mode  1 iterative 
scheme  for  system  identification  by  Steiglitz  and  McBride 
[13]*  Steiglitz's  approach  to  identifying  the  ARMA 
parameters  is  the  following: 

1)  Assume  the  sequence  to  be  identified,  x(lc),  is  the 
output  of  the  unknown  system. 

2)  Assume  the  input  to  this  system,  u(k),  is  a 
Kronecker  delta  function. 

3)  Minimize  the  error  criterion 


N-1 


I 

k=0 


37) 


by  the  appropriate  choice  of  A(z)  and 
z-transforms  of  the  denominator 

q 


B( z ) , the 

parameters 

P 


{a(i)}j,  and  numerator  parameters  { b ( 1 ) } , 

respectively.  U(z)  and  X(z)  are  the  z-transforms 
of  the  unknown  system's  input  and  output 
sequences,  respectively.  ^ z-transform 

of  an  initial^guess  for  the  denominator  filter. 

4)  By  replacing  Aq ( z ) with  the  A(z)  determined  in 
step  3),  new  estimates  for  A(z)  and  B(z)  can  be 
found.  This  iterative  procedure  can  then  be 
continued  until  the  desired  accuracy  is  obtained. 

q p 

The  solution  of  37)  for  the  {a(i))i  and  the  {b(i)}o  involves 
the  solution  of  a system  of  q4'P4’1  linear  equations. 
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Software 


The  last  part  of  this  section  describes  the  software 
needed  to  generate  the  data  base  used  for  simulating  an  AR 
process  corrupted  by  additive  white  noise.  The  data  derives 
from  two  noise  files  obtained  by  digitizing  the  output  of  an 
analog  noise  generator.  The  analog  signal  is  prefiltered 
with  a low  pass  filter  having  a 3«2  kHz  cutoff  frequency  and 
is  sampled  at  6667-  Hz.  From  two  files  generated  in  this 
manner,  one  is  scaled  so  that  its  sample  variance  is 
approximately  1.0.  This  sequence  is  p(k),  the  excitation 

sequence  for  the  AR  process  to  be  simulated.  The  second 

noise  file  n^Ck)  is  used  to  generate  n(k),  the  additive 
noise  . 

A two  step  process  generates  the  data  base  for  a 

particular  AR  process. 

1)  Design  and  generation  of  the  AR  process.  Using 

the  program  ARPGEN.SAV,  the  user  designs  an  AR(q} 

process,  1-1  q L 20 , The  filter  is  checked  for 

stability  by  analyzing  the  partial  correlation 
coefficients  derived  from  the  AR  parameters.  The 
power  spectrum  of  the  Inverse  filter  corresponding 
to  the  AR  coefficients  is  calculated  and 

displayed.  If  the  model  is  acceptable,  the  time 
series  s{k)  corresponding  to  this  process  is 

computed  using  the  noise  sequence  e(k)  mentioned 
above  as  the  excitation  to  the  AR  filter. 


78 


2)  Generation  of  the  additive  noise.  After  finding 
approximations  to  the  variances  of  s(k)  generated 
in  1)  and  the  noise  file  n^Oc)  described  above, 
the  program  SPLUSN.SAV  can  be  used  to  generate 
either  of  the  outputs 

n(k)  = c • nQ(l() 

or 

x(k)  = s{k)  + c • nQ(k). 

The  constant  c is  computed  by  SPLUSN.SAV  so  that 
the  quantities  c*  nQ(k)  and  s(k)  will  have  a 
specified  signal-to-noise  ratio. 

Using  the  above  system  of  data  and  the  two  programs 
discussed,  various  AR  processes  and  additive  noise  sequences 
can  easily  be  synthesized  in  preparation  for  analysis  by  the 
ARMA  noise  model  approach. 
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PRELIMINARY  RESULTS 


One  of 

the 

objectives 

of  this 

research  is 

to 

characterize 

the 

effects  of 

additive 

white  noise  on 

LPC 

analysis  systems.  The  following  data  illustrate  the 
degradation  caused  by  additive  noise.  Results  are  presented 
for  a speech  waveform  sample  at  varying  levels  of  noise. 
Figure  1 shows  the  speech  frame  used  as  the  example  for  this 
section.  The  time  waveform  is  shown  in  Fig.  la).  Sampled 
at  6667.  Hz,  this  frame  of  128  samples  corresponds  to  about 
19  msec.  of  speech.  This  frame  represents  a portion  of  the 
schwa  vowel  /a/  in  the  word  "rust".  This  particular  vowel 
was  selected  because  of  the  nearly  uniform  distribution  of 
formants.  Also,  the  formants  drop  in  peak  magnitude  at  a 
constant  rate  as  frequency  Increases  (on  a dB  scale).  Fig. 
1b)  shows  the  DFT  of  this  frame  of  speech,  after  windowing 
with  a Hamming  window.  On  the  dB  scale,  the  nearly  uniform 
formant  structure  of  the  schwa  vowel  is  apparent. 
Superimposed  on  Fig.  1b)  is  the  spectrum  corresponding  to  a 
10  pole  LPC  fit  of  this  frame.  The  LPC  spectrum  is  smoother 
and  matches  the  formant  peaks  well. 

Fig.  2,  a)-e),  show  the  effects  of  additive  white 


noise 

with 

progressively 

smaller 

signal-to-nolse  ratios: 

40,  30, 

20, 

10, 

and  0 dB. 

The  SNR  is 

found  by  averaging  the 

energy 

in 

the 

speech  and 

the  noise 

sequences  over  several 

seconds.  The  ratio  of  these  energies  is  then  used  to 
determine  the  SNR,  defined  as 
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Figure  1 : 


t 
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Example  frame  used  as  s(k) 

a)  128  samples  of  the  vowel  /a/,  sampled  at  6667.  Hz. 

b)  Spectrum  of  /a/  and  a 10  pole  LPC  fit  to  that 
spectrum. 


of  the  effects  of  additive  white  noise  on 
leech  frame  and  10  pole  LPC  approximations 
nq  spectrum. 
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Superimposed  on  each  spectral  plot  is  the  corresponding  10 
pole  LPC  fit.  All  spectral  graphs  in  Figures  1 and  2 are  on 
the  same  scale  and  can  be  compared  directly.  The  following 
noise  effects  are  noted: 

1)  with  decreasing  SNR,  the  noise  "floor"  rises, 
obscuring  more  of  the  formant  structure  of  the 
speech ; 

2)  the  formants  Identified  by  LPC  analysis  in 
increasingly  poorer  SNR's  tend  to  be  wider  in 
bandwidth  and  have  their  peaks  at  slightly  higher 
frequencies; 

3)  the  formant  structure  identified  by  LPC  is  badly 
degraded  for  SNR's  below  about  20  dB. 


The  Importance  of  the  assumption  of  uncorrelated  signal 
and  noise  is  demonstrated  in  the  next  set  of  data.  This 
assumption  is  primary  to  the  autocorrelation  correction 
methods  of  parameter  estimation  [1],  [ *1  ] . Figure  3a)  shows 
%s  (k),  the  autocorrelation  function  for  the  frame  of  speech 
being  discussed.  Plotted  in  Fig.  3b)  are  R^^^Ck),  the  noise 
autocorrelation  function,  and  (k),  one  of  the 
crosscorrelation  functions.  The  abscissa  in  Fig.  3a)  and 
b)  starts  at  lag  k s 0 and  is  followed  by  100  lags  for 
k s 1,  ...,  100.  The  last  100  points  are  the  negative  lags 
in  the  order  k * -100,  ...,  -1.  The  noise  used  for  Fig.  3 
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corresponds  to  a 10  dB  SNH.  It  is  obvious  that  0, 


based  on  the  estimation  of  **5p(*f)  from 
N-1 

Rs„(lc)  ' [ s(i)n(1+k). 
i=0 

The  spectral  implications  of  this  are  shown  in  Fig.  3c), 
which  shows  four  spectral  curves  determined  from  LPC 

coefficients  calculated  from  the  four  autocorrelations: 
i)  Rss(k), 

li)  = l!„(k)  - R„„(k)  - R^^(k)  - R^Jk) 

!»•)  = l>«,(k)  - R„„(k). 

iv)  Rxx(‘‘)- 

Note  that  1)  and  il)  result  in  the  same  spectral  plot.  The 
explicit  assumption  of  uncorrelated  signal  and  noise  is  used 
in  iii),  while  iv)  corresponds  to  LPC  coefficients 
determined  from  noisy  data,  with  no  correction  attempted. 
Fig.  3c),  curve  ill),  shows  the  inadequacy  of  the 
uncorrelated  assumption  for  the  autocorrelation  correction 
modeling  approach.  Even  though  curve  iii)  appears  superior 
to  iv),  in  a large  percentage  of  frames,  the  LPC  algorithms 


will  fail,  producing 

unstable 

inverse  filters. 

when 

the 

autocorrelations  of 

^ss  ^ ^ 

are  based  on 

ill) . 

An 

autocorrelation  matrix  which  is  not  positive  definite  causes 
this . 


The  preceding  results  show  LPC  analysis  procedures  are 
sensitive  to  additive  white  noise  degradation.  The 
autocorrelation  correction  method,  while  appealing,  is  not 


38) 
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Figure  3:  Comparison  of  autocorrelation  and  crosscorrelation 
sequences  for  the  sample  frame  with  a 10  dB  SNR. 

a)  R„(k) 

b)  R„„(k)  and  R^„(k) 

c)  10  pole  LPC  spectra  based  on  autocorrelations  i-iv 
discussed  in  text. 


successful  if  the  uncorrelated  signal-noise  assumption  is 


made  . 


Preliminary  Results  of  the  ARMA 
Model  for  Additive  White  Noise 

In  developing  the  ARMA  model  suggested  for  estimating 
AR  parameters  when  noise  is  present,  the  major  components  of 
the  procedure  suggested  by  Pagano  have  been  implemented. 
Initial  tests  of  the  algorithm  performed  on  speech  samples 
were  unsuccessful  in  either  proving  or  disproving  the 
technique  or  verifying  the  operation  of  the  software.  It  is 
this  result  that  led  to  the  used  of  synthetic  AR  processes 
for  testing  of  this  approach.  This  will  avoid  the  critical 
assumption  that  the  original  undegraded  time  series,  speech 
in  this  case,  can  be  modeled  as  an  AR(q)  process.  It  also 
led  to  a perturbation  analysis  as  a way  of  verifying  the 
correct  operation  of  the  nonlinear  regression  (NLR) 
software.  The  perturbation  study  of  the  NLR  technique  also 
provides  information  about  the  ability  of  NLR  to  converge  to 
the  correct  parameter  set  when  the  initial  parameters  are  in 
error.  The  perturbation  analysis  is  baaed  on  the  following: 

1)  A set  of  q AR  parameters  and 

7 7 

variances  and  a ft>*e  selected  to  characterize 
an  AR  process  in  additive  noise. 

2)  Using  equation  30)  of  the  previous  section,  the 

q 

(Ryy(k))^  are  calculated  from  the  q*2  parameters 


selected  In  step  1). 

3)  With  z = f_(0)  + ^ describing  the  nonlinear  mapping 

q 

from  0 to  z,  construct  z from  the  {a(i))i  and 

<"yy  • 

>t)  The  parameters  of  £ are  the  {aCi))^,  o^  and  o^. 
The  perturbation  will  be  Introduced  into  0. 

5)  Prior  to  entering  the  NLR  routine,  some  or  all  of 
the  parameters  of  0,  are  perturbed. 

6)  The  NLR  iterative  procedure  then  attempts  to 
correct  0 so  that  £ 0. 

For  the  q = 1 case,  an  analytical  development  of  the  NLR 
technique  is  possible,  and  the  results  can  be  used  to 
predict  the  findings  of  the  computer  analysis. 

The  following  points  can  be  deduced 
theoretical  study: 

1)  For  errors  in  either  or 

2 2 

the  0 and  o parameters  of  0 , the 

e n — 

corrected  in  one  Iteration  without 
errors  in  any  other  component  of  £. 

2)  Any  error  in  the  single  AR  parameter  a(1),  whether 

2 ? 

accompanied  by  errors  in  o or  a , or  not,  is 

t n 

corrected  in  one  Iteration.  The  first  Iteration, 

2 

however,  produces  an  error  in  the  component 
of  £.  It  thus  requires  one  additional  iteration 
to  correct  that  error. 

3)  If  the  perturbation  of  the  a(1)  parameter  results 

in  an  initial  value  of  a(1)  s o,  then  the 


from  the 

both  of 
errors  are 
Introducing 
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theoretical  development  predicts  that  the  NLR 


technique  based  on  the  Gauss-Newton  method  will 
fall  on  the  first  iteration. 

Computer  simulation  of  the  perturbation  analysis  for  the 
q = 1 case  verified  the  above  theoretical  results  with  the 
following  qualification:  the  NLR  Gauss-Newton  technique 

Involves  the  inversion  of  a matrix  which  becomes 

2 -2 
ill-conditioned  as  , the  actual  noise  variance,  or  Op , the 

estimate  of  that  variance,  get  large.  This  observation 

results  from  the  finite  word  length  and  round-off  effects 

associated  with  computers.  This  is  not  predicted  by  the 

theoretical  analysis,  though  it  is  apparent  from  the  matrix 

equations  associated  with  this  technique  why  this  would 

happen.  Its  presence  in  the  simulation  is  the  Justification 

for  using  small  variance  sequences  for  e(k)  and  n(k) — on 

the  order  of  100.  or  less.  To  prevent  the  increase  of  the 

effects  of  quantization  noise,  data  must  be  stored  on  disk 

in  the  unpacked  format  of  128  data  points  per  disk  block. 


Tables  1 and  II  show  perturbation  effects  for  che  q s 1 
case  for  the  parameters  listed  in  the  tables.  The  examples 
in  these  tables  represent  the  worst  case  in  which  there  are 
errors  in  all  three  parameters.  For  demonstration  purposes, 
the  errors  are  *200%  for  each  parameter.  The  value  of  0 at 
each  Iteration  is  given  to  four  decimal  places. 


Table  I 

Perturbation  Analysis,  q = 1 


a(1)  _ 

2 

o 

£ 

2 

^n 

Dist 

Design  Parameters 

0.8 

1.0 

1.95 

- 

Initial  Estimates 

2.4 

3.0 

5.85 

2.56 

Iteration  #1 

0.8000 

18.3679 

4.5499 

7.52  X 

10'^’ 

#2 

0.8000 

1.0000 

1 . 9500 

8.44  X 

o 

1 

Table  II 

Perturbation  Analysis,  q = 1 

2 2 
a(l)  • % 

Dist 

Design  Parameters 

-0.8 

1.0 

3.03 

- 

Initial  Estimates 

-2.4 

3.0 

9.09 

2.56 

Iteration  #1 

-0.8000 

27.9869 

7.0699 

2.44  X 10‘^° 

#2 

-0.8000 

1.0000 

3.0301 

3.09  X 10"^’ 

1 


Another  more  interesting  perturbation  study  is  shown  in 

Table  III.  In  this  case,  the  AR  coefficients  are  a(1)  = 

-2.3t  a(2)  r 2.4,  a(3)  = -1.6,  and  a(4)  = 0.6.  For  purposes 

of  demonstration,  the  initial  estimates  are  a{i)  = -a(i),  a 

-200%  error  in  estimating  each  coefficient.  The  actual 

variances  and  their  estimates  are  shown  in  Table  III,  which 

Illustrates  the  convergence  of  0 to  the  correct  solution. 

From  the  table,  it  is  evident  that,  to  four  decimal  places, 

the  convergence  for  the  four  AR  coefficients  is  complete  in 

10  iterations.  Note  that  in  Tables  I,  II,  and  III,  the 

entry  for  "Dist"  under  each  iteration  is  the  I2  distance 

q 

between  the  (a(i))9  and  the  {a(l))^.  The  noise  variance 

2 

Op  in  each  of  the  three  perturbation  examples  corresponds  to 
a 0 dB  SNR  when  compared  to  the  sample  variance  of  the 
respective  AR  process. 

While  the  preceding  discussion  demonstrates  the  power 
of  the  NLR  technique,  using  a Gauss-Newton  approach,  the 
following  is  an  initial  attempt  at  applying  the  entire 
procedure  to  the  synthetic  AR(1)  sequences.  The  results 
seem  to  indicate  two  major  problems: 

1)  The  initial  estimates  for  the  {a(l))  are  poor. 
Hannan's  method  begins  by  estimating  the  AR 

parameters  by 

? a(i)R,  (1-k)  . - (k).  k « q+1 2q.  39) 

The  next  step  in  Hannan's  procedure  is  to 
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calculate  the  approximate  autocorrelation  function 
of  the  moving  average  sequence  y(k): 


(k) 


1 I a(i)a(j)R(k+i-j),  k = 0,  ....  q. 
i=0  j=0 


41) 


If  the  spectrum  Syy(u))  correspondine  to  R yy(  k ) is 
non-negative,  the  spectrum  can  be  factored  to  give 

q 

the  {b(i))^  , initial  estimates  of  the  MA 
parameters.  This  spectral  factorization 

represents  the  solution  of 


^ (k) 
yy 


o;  I b(i)b(i+v),  b(o)  = i. 

" i=0 


41) 


- q -2 

for  the  {b(l))i  and  . This  solution  is 
nonlinear  in  nature.  The  procedure  proposed  by 
Hannan  then  uses  an  Iterative  stage  to  Improve 
these  Initial  estimates.  Presently,  the  only 
steps  of  this  technique  being  used  are  the 

_ q - q 

calculation  of  the  {a(i))i  and  the  {RyyCk))^  using 
equations  40)  and  41).  This  would  avoid  the 
nonlinear  spectral  factorization  necessary  to 

P 2 

obtain  the  {b(i))|  and  ^ , which  are  then  , 

I n 

recombined  by  41)  to  obtain  the  {Ryy(k))p  for  the  | 

Gauss-Newton  NLR  routine.  However,  further 
improvement  on  these  parameters  may  be  necessary 

to  prevent  the  NLR  technique  from  diverging. 


87 


SECTION  IV 


MULTIRATE  SIGNAL  PROCESSING 

H . Ra V indra 

Abstract 


The  aim  of  this  project  is  to  simulate  a system  on  a 
Digital  Computer,  which  can  increase  or  decrease  the 
sampling  rate  of  an  acoustic  signal.  These  two  operations 
are  called  Interpolation  and  Decimation  respectively.  This 
project  is  the  first  phase  of  a larger  project  which 
involves  the  simulation  of  a CVSD  system,  with  an  idea  of 
studying  the  problems  of  tandeming.  Since  the  CVSD 
performance  ( signal-to-noise  ratio)  is  better  at  higher 
sampling  rates,  an  Interpolation/Decimation  scheme  is 
required  to  translate  the  sampling  rate  from  the  lower  6.67 
KHz  to  higher  rates. 
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In  t reduction 


In  many  digital  signal  processing  systems,  e.g., 
vocoders,  modulation  systems  and  digital  waveform  coding 
systems,  it  is  necessary  to  alter  the  sampling  rate  of  a 
digital  signal.  In  the  present  context,  we  are  interested 
in  the  last  application.  The  multirate  signal  processing 
system  is  actually  the  first  phase  of  a larger  project.  The 
main  project  involves  studying  the  problems  of  tandeming  a 
evSD  system  with  a vocoder.  This  requires  simulation  of  two 
systems  on  a digital  computer.  They  are,  the  multirate 
signal  processing  system  and  a CVSD  encoder /de coder  system. 
In  the  present  write  up,  the  multirate  signal  processing 
system  is  described.  Firstly,  interpolation  and  decimation 
of  the  sampling  rate  of  a digital  signal  are  shown  to  be 
simple  linear  filtering  processes.  Later,  it  will  be  shown 
that  an  interpolator  or  a decimator  can  be  implemented 
optimally  over  several  stages.  Also,  the  suitability  of  FIR 
filters  over  HR  filters  for  this  application  will  be 
discussed.  In  the  present  work,  both  non-optimal 
(single-stage)  and  optimal  (multi-stage)  interpolators  and 
declmators  have  been  Implemented  and  comparative  results  are 
presented  at  the  end  of  the  write  up. 
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Interpolation  and  Decimation  as.  Linear  Filtering 


Let  x(t)  be  a continuous  time  signal  and  xln) 
the  sampled  version  of  x(t) 

1 . e . , 

x{n)  = x(nT)  where  T is  the  sampling  period. 

It  can  be  shown  that  the  Fourier  Transform  of  x(t)  and 
x(n)  are  related  as  follows: 


= i I X(a.  + k^) 
' k=-»  ' 


If  x(t)  Is  band  limited,  i.e.,  x(  u>  ) 


0 for 


I (D I ^ and  1- f T ^ ^ (to  avoid  aliasing),  then 

= j X(a.)  "y  1 1 f. 

(a)  Sampling  rate  reduction  b v integer  factors : 

Suppose  that  the  desired  sampling  period  is 
T’  = MT . If  M is  an  integer,  then. 


y(n)  » x(nT' ) = x(nMT) 
* X (Mn) . 
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So,  decimation  by  an  integer  factor  M is  achieved 

simply  by  "picking  off"  every  Mth  sample  from  the 

original  signal  sampled  at  a rate  of  i . Before 

T 

applying  the  decimation  process,  it  may  be  necessary 
to  perform  low  pass  filtering  of  the  original  signal 
x(n),  to  avoid  aliasing,  as  shown  below. 

It  is  seen  that  the  Fourier  Transform  of  the 
decimated  signal  is, 

t = 0 

so,  unless  (i.e.,  aliasing  results 

and  it  is  not  possible  to  recover  the  original  signal 
from  the  sampled  (decimated)  version.  Hence, 
prefiltering  (lowpass)  is  needed  when  T'  > n/n 

(i.e.,  T > to  avoid  aliasing.  The  low  pass 

filter  must  have  a cutoff  frequency  of  n/Mfl. 

(b)  LMlS.  Increase  ^ integer  factors ; 

Let  T*  * T/L.  If  L is  an  Integer,  the  new 
sampling  rate  1/T'  would  be  equal  to  the  original  rate 
1/T  multiplied  by  the  factor  L.  It  has  been  shown 
that  Interpolation  involves  two  steps.  In  the  first 
step,  L-1  zero  samples  are  Introduced  between  samples 
of  the  original  signal  and  in  step  2,  the  resulting 
signal  is  lowpass  filtered. 
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The  Fourier  transform  of  the  original  signal 


padded  with  zeros  can  be  shown  to  be, 


and  it  is  seen  that  it  is  periodic  with  a period  of 
^tt/T  . But  the  properly  interpolated  signal  should  be 
periodic  with  a period  of  2tt/T'.  Therefore,  low  pass 
filtering  is  needed,  after  padding  with  zeros,  to  keep 
only  the  base  band  and  attenuate  the  inner  lobes. 
Also,  it  is  clear  that  the  passband  gain  of  the  filter 
should  be  equal  to  the  interpolation  ratio. 

( c ) Changing  bv  non-integer  factors ; 

Let  the  new  sampling  period  be  T'  = pT , where  M 
and  L are  integers.  An  interpolation  or  decimation  by 
a non- integer  factor  can  be  realized  by  first 
Interpolating  by  the  factor  L,  and  then  decimating  by 
a factor  M.  If  the  overall  factor  by  which  the  rate 
is  changed  is  less  than  unity,  LP  filtering  is  needed 
before  the  decimation  step  to  avoid  aliasing. 

II . Selection  of  the  Tvoe  of  Filter; 

The  choice  is  between  FIR  and  HR  filters.  The 
following  advantages  of  the  FIR  filter  make  this  class 
more  suitable  then  the  HR  filters. 
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(i)  FIR  filters  can  be 


rea 1 Ized 


with 


precisely  linear  phase  and  can  be  made  to  approach  the 
ideal  magnitude  response  by  reducing  the  stop  band  and 
passband  ripples,  and  also  by  reducing  the  transition 
bandwidth.  Of  course,  this  requires  long  filters. 
Though  HR  filters  can  be  realized  with  extremely  good 
magnitude  response,  they  suffer  from  non-linear  phase 
characteristics. 

(ii)  It  can  be  shown  that  the  post 
filtering  required  for  interpolation  and  prefiltering 
required  for  decimation  can  be  combined  to  form  a 
single  filter.  This  filter  accepts  zero  padded  signal 
and  generates  the  final  output  - (Interpolated  and 
decimated,  to  realize  a non-integer  change  in  rate). 
If  FIR  filters  are  used,  the  fact  that  the  filter  sees 
one  nonzero  sample  in  every  L samples  and  produces  an 
output  sample  in  every  M samples,  results  in  reduced 
computational  complexity.  Also,  during  the  decimation 
process,  the  symmetry  of  the  filter  can  be  utilized  to 
reduce  computation  by  a factor  of  2. 

Ill . Optimal  Design  of  Interpolators  and  Decimators 


Crochiere  and 

Rabiner 

have  shown 

that 

the 

post 

filter  required 

in 

the  interpolation 

step 

and 

prefilter  required 

in 

the 

decimation 

step 

, can 

be 

combined  into  a single  filter  to  realize  a non-integer 
ratio  of  change  of  rate.  Also,  they  show  that  the 
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amount  of  computation  can  be  reduced  considerably  by 
implementing  interpolators  and  decimators  in  a 
multistage  configuration,  and  develop  techniques  for 
their  optimal  design. 

(a)  The  physical  reason  for  reduced  computation 
when  multistage  implementation  is  used,  is  the 
f ollowing . 

Since  the  FIR  filter  accepts  a zero  padded 
signal,  it  sees  one  nonzero  sample  in  every  L samples. 
Also  since  it  is  performing  predecimation  low  pass 
filtering,  it  need  output  one  sample  in  every  M 
samples.  Therefore,  the  computational  complexity 
(number  of  multiplies  and  adds:  MADS)  is  proportional 
to  N/(LM).  That  is,  the  number  of  MADS  required  to 
generate  each  output  point  is  N/(LM).  It  can  be  seen 
from  Figure  ^ that  a multistage  implementation  of  a 
decimator  requires  the  longest  filter  for  the  last 
stage  and  relatively  very  short  filters  for  the 
earlier  stages.  The  opposite  is  true  of 
interpolators.  For  the  same  end  results,  the  ripple 
specifications  on  each  state  is  more  severe  (by  a 
factor  equal  to  number  of  stages)  than  in  the  case  of 
a single  stage  implementation.  But  it  is  shown  that 
this  affects  the  total  computation  by  a small  amount. 
Since  the  filter  in  the  last  stage  (longest)  of  a 
multistage  decimator  and  the  filter  required  by  a 
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single  stage  declinator  compare  as  shown  In  Figures  1 
and  2,  It  Is  seen  that  the  single  stage  filter  Is  much 
longer.  So,  by  realizing  a large  amount  of  decimation 
over  the  earlier  stages  and  a small  amount  (so  that 
both  L and  M are  large  for  the  last  stage)  over  the 
last  stage,  the  number  of  MADS  can  be  reduced 
considerably.  Very  large  reduction  In  the  number  of 
MADS  required,  is  realized  when  the  Interpolation  or 
decimation  ratio  is  greater  than  about  20. 

( b ) Design  procedure  ( optimal  dec imator ) 

The  number  of  MADS  required  per  second  is  shown 
to  be , 

Rj  = S(MADS) 


where,  5s)  is  a function  of  stopband 

(6s)  and  passband  (5p)  variances,  and  the  number  of 
stages,  K.  is  the  initial  sampling  rate  and 

S(MADS)  is  a function  of  the  decimation  ratios  of  the 
K stages,  and  the  transition  bandwidth. 

Crochiere  and  Rabiner  have  shown  that  D«  is  not 
very  sensitive  to  the  number  of  stages  (Reference:  < 

Table  1),  but  the  function  S(MADS)  is.  So,  they  have 
developed  various  design  curves  which  can  be  used  to 

I 

( I 


95 


find  the  optimum  number  of  stages  and  an  optimal  set 
of  values  for  the  decimation  ratios  for  the  K stages. 
For  a two  stage  design,  solution  is  available  in 
closed  form  as 


•^lOPT 


2D(1  - /DAf/(2-Af)) 
2 - Af(1  + D) 


and 


0 


20PT 


^/‘^lOPT’ 


f s - f D 
fs 


and  D = required  decimation  ratio 


The  design  procedure  for  a general  K stage 
decimator  follows. 

The  specifications  are  6p,  6s,  Af,  D and  fpQ.  It 
has  been  shown  that  the  use  of  more  than  four  stages 
will  only  increase  the  complexity  of  implementation 
and  will  not  reduce  computation  any  further. 
Therefore,  the  values  of  D„  for  the  specified  values 
of  6p,  6s  are  found  for  K * 1,2,3  and  U from  Table  1. 
Referring  to  Figure  5,  the  value  of  s,  for  the 
specified  values  of  D and  f,  is  found  for  each  value 
of  K=1  through  U.  The  value  of  R.|.  is  then  computed 
for  each  K.  Obviously,  the  value  of  K,  which  results 


in  minimum  value  for  R^,  is  the  optimal  value.  From 
the  graphs  in  Figure  6,  the  decimation  ratio  for  each 
stage  can  be  found  from  the  specified  values  for  D and 
Af. 


The  same  design  curves  may  be  used  for  the  design 
of  an  optimal  interpolator  as  the  processes  of 
interpolation  and  decimation  are  duals  to  each  other. 

IV . Practical  Consideration  in  the  Implementation  of 

Mult istage  Decimator  and  Interpolators : 

An  implementation  strategy  is  described  by 
Crochiere  and  Rabiner  which  automatically  takes  care 
of  the  presence  of  L-1  zero  samples  between  non-zero 
samples  in  the  input,  without  actually  checking  for 
zeros  and  also  generates  only  one  output  point  for 
every  M samples. 

If  the  length  of  the  unit  sample  response  of  the 
LP  filter,  N,  is  chosen  such  that 
N r QL  (where  L in  the  decimation  or  interpolation 
ratio  and  Q Is  an  integer), 

then,  exactly  Q non-zero  samples  of  the  input  sequence 
(effectively  padded  with  zeros)  are  spanned  by  the  unit 
sample  response. 
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Then,  the  output  sequence  is  given  by 


y(n)  h(kL+(nm)  @ L)x([['— ] - k) 

k = 0 


where  h(n)  is  the  filter  impulse  response,  ( ) @ L implies 
the  quantity  in  parentheses  modulo  L and  [^]  is  the 
integer  value  of  nM/L.  From  this,  it  is  seen  that  the  input 
sequence  is  to  be  sequentially  addressed  for  Q of  its 
values,  to  generate  one  output  point.  Also,  the  filter  unit 
sample  response  must  be  addressed  by  (KL+(nM)  @ L).  But,  if 
h(n)  is  stored  in  a scrambled  order:  h(0),  h(L),..., 
h((0-1)L),  h(M0L),  h(L+M@L),  . . . , h ( ( 0- 1 ) L^M  @ L ) , 

,h( ((L-1)M)  @ L) , h(L+((L-1 )M)  @ L)  , 

h ( ( 0- 1 ) L+ ( ( L- 1 ) M ) @ L),  then  it  can  be  addressed 
sequentially  for  Q of  its  values  for  computing  each  output 
sample . 
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Presen  t Wgr.h 


V . 

In  the  present  work,  both  non-optlmal  (single-stage) 
and  optimal  (multi-stage)  interpolators  and  decimators  have 
been  implemented  for  interpolation  ratios  of  3/2,  3 and  5 to 
get  at  sampling  rates  of  10  KHz,  20  KHz,  and  33  KHz, 
starting  from  6.67  KHz,  and  decimation  ratios  of  3/2,  3 and 
5 to  get  back  to  sampling  rates  of  6.67  KHz. 

From  the  design  curves,  it  was  determined  that  two 
stage  Implementation  was  optimal  for  ratios  of  3 and  5,  and 
one  stage  for  a ratio  of  3/2.  Optimal  values  for  the 
interpolation  (decimation)  ratios  for  both  the  stages  are 
calculated  and  filters  specified.  The  filters  were  designed 
using  the  REMEZ  exchange  algorithm.  In  the  multistage 
implementation,  the  filters  are  scrambled  as  explained  in 
the  previous  section  and  stored  on  files.  The  two 
implementations  were  tested  for  run  times  and  results 


fo llow. 

Integer  ratio 

Single  stage 
interpolator 

Mu  1 1 i s t a p e 
interpolator 

3/2 

70  seconds 

20  seconds 

3 

72  seconds 
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150  seconds 

34  seconds  ; 
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The  spectra  of  original  speech  and  Interpolated  speech 
signals  are  presented  In  Figure  7 which  clearly  show  the 
frequency  domain  effects  of  Interpolation.  The  language 
used  for  Implementation  Is  FORTRAN.  (Listings  of 
single-stage  Interpolator  and  multi-stage  Interpolator  and 
declmators  are  enclosed.) 

V I . Suggestions  for  Improvement 

Goodman  and  Carey  propose  a set  of  nine  FIR  filters  of 
which  eight  are  halfband  filters,  to  be  used  In  proper 
sequence,  along  with  resampling  to  realize  a wide  range  of 
accurate  Interpolation  and  decimation  ratios.  It  Is  claimed 
that  the  filters  are  efficient  in  terms  of  number  of 
multiplications,  since  most  of  the  filters  In  the  set  are 
half  band  filters  (In  which  nearly  half  of  the  Impulse 
response  coefficients  are  zero)  and  that  the  required  filter 
sequence  can  be  designed  without  a computer. 
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snci'ioN  V 


CONTINUOUSLY  VARIABLE  SLOPE  DELTA  MODULATOR 

H.  Ravindra 


I . Introduction 

Delta  modulation  is  a waveform  encoding  technique  which 
is  characterized  by  one  bit  information  over  each  clock 
period.  In  its  simplest  form,  the  modulator  consists  of  a 
comparator  (Ref.  Figure  1),  sampler  and  an  accumulator. 
The  comparator  compares  the  input  signal  and  the  accumalator 
contents,  and  produces  an  error  signal  at  the  data  sampling 
rate.  The  sampler  generates  a one  bit  output  1 or  0 
depending  on  whether  the  error  signal  is  positive  or 
negative  respectively.  A fixed  step.  A,  is  either  added  to 
or  subtracted  from  the  accumulator  when  the  output  bit  is 
a 1 or  0 respectively  - then  the  accumulator  content  will  be 
an  approximation  to  the  original  speech  signal.  So,  the  bit 
stream  output  by  the  encoder  represents  an  approximation  for 
the  original  signal.  The  decoder  is  similar  to  the  encoder 
except  for  the  absence  of  the  comparator  and  the  presence  of 
a LP  filter  at  the  output.  It  has  been  shown  that  a simple 
scheme  like  this  suffers  from  two  main  disadvantages.  They 
are  : 

(i)  slope  overload  noise  and 
(ii)  granular  noise. 


101 


Slope  overload  results  if  the  Input  speech  signal  to 
the  encoder  varies  so  fast  that  the  encoder  cannot  track  it 
well  and  granular  noise  results  due  to  the  alternation  of  1 
and  0 to  represent  a constant  level.  These  noises  can  be 
reduced  considerably  by  adapting  the  step  size  dynamically 
so  that  when  the  slope  is  high,  the  step  size  increases  and 
when  the  slope  is  very  low,  the  step  size  decreases.  In  the 
digital  adaptive  delta  modulators,  the  present  and  previous 
bits  are  compared  and  if  they  are  the  same  (indicating  slope 
overload),  the  step  size  is  increased  and  if  they  are 
opposite,  the  step  size  is  reduced.  In  a CVSD  encoder,  the 
step  size  is  adapted  more  smoothly  in  time,  with  a time 
constant  that  is  of  the  order  of  5 - 10  ms.  to  match  the 
syllabic  rate  of  speech.  It  has  been  claimed  that  CVSD 
processed  speech  sounds  "cleaner"  at  bit  rates  of  25  K 
bits/second  and  lower.  Also,  this  method  provides  for  an 
increased  resistance  to  bit  errors.  But  the  price  paid  is 
increased  slope  overload  distortion  compared  with 
Instantaneous  compandors. 

1 1 . CVSD  Encoder  and  Decoder  f Reference : Figure  1 1 

The  comparator  compares  the  audio  input  with  the  output 
of  the  integrator  and  produces  an  error  signal.  The  sampler 
produces  a 1 or  0 depending  on  whether  an  error  signal  is 
positive  or  negative  respectively  at  that  clock  instant. 
The  algorithm  detects  three  like  consecutive  bits  and 
outputs  a M'  and  a *0'  otherwise.  The  algorithm  output  is 
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at  clock  rate.  The  slope  command  low  pass  filters  the 
algorithm  output  and  produces  a signal  which  acts  as  the 
modulating  signal  in  the  PAM.  The  slope  command  LP  filter 
is  selected  such  that  its  time  constant  matches  the  syllabic 
rate  of  speech.  As  can  be  seen,  the  modulating  signal 
increases  exponentially  if  there  is  a slope  overload,  and 
hence  results  in  increasd  step  size.  The  PAM  modulates  the 
sampler  output  with  the  slope  command  output.  The 
integrator  constructs  an  approximation  for  the  original 
audio  signal.  The  decoder  is  similar  to  the  encoder  except 
for  the  absence  of  the  comparator.  It  is  suggested  that  the 
slope  command  filter  should  have  a pass  band  width  of  25  to 
40  Hz  and  that  the  compression  ratio  should  be  12:1. 

III.  Present  Implementation 

The  CVSD  system  is  simulated  on  a digital  computer. 
The  algorithm  is  implemented  by  a 3 bit  Shift  Register  which 
stores  the  present  bit  and  the  past  two  bits  and  produces  an 
output  of  *1*  if  all  the  bits  are  alike  and  *0*  otherwise. 
The  slope  command  filter  is  a digital  LP  filter  with  a 
passband  cutoff  between  25  and  40  Hz  and  a roll  off  rate  of 
20dB  per  decade.  The  slope  command  filter  is  convolved  with 
the  algorithm  output  to  generate  the  slope  command  signal. 
This  signal  modulates  the  sampler  output  and  is  summed  up 
with  the  proper  sign  in  the  accumulator,  and  then  low  pass 
filtered  to  generate  an  approximation  for  the  original  audio 
signal.  Since  this  la  a feedback  system,  it  is  necessary  to 
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generate  outputs  from  the  slope  command  module,  the  PAM  and 
the  Integrator  one  point  at  a time.  So,  the  convolutions 
mentioned  earlier  are  carried  out  such  that  one  point  is 
generated  and  output  from  the  module  at  a time.  Two 
operations  are  to  be  performed  on  the  slope  command  output 
before  it  is  sent  to  the  PAM.  They  are, 

(i)  Biasing : When  the  speech  input  is  zero,  the 
sampler  output  will  be  a sequence  of  alternating  1's  and 
O's.  So,  the  modulating  signal  will  be  zero  and  this 
requires  the  addition  of  a bias  to  the  slope  command  signal 
so  that  the  approximation  to  a constant  level  is  a train  of 
alternating  1's  and  O's  of  small  magnitude. 

(11)  Amplification ; When  the  slope  of  the  input 
signal  is  large  over  a long  period  of  time,  the  slope 
command  output  saturates  at  the  maximum  possible  value. 
This  value  may  not  be  sufficient  to  provide  for  closer 
tracking  of  the  high  slope  signal.  Therefore  the  slope 
command  output  must  be  amplified. 

The  biasing  and  amplification  can  be  adjusted  so  that 
the  compression  ratio  is  12:1  as  specified. 

The  FIR  filters  are  designed  using  the  REMEZ  exchange 
algorithm.  The  system  reads  in  speech  from  a file,  encodes 
it  and  passes  the  encoded  output  into  the  decoder  which 
decodes  the  encoded  speech  and  produces  the  output  speech 
f which  is  written  onto  a file. 

i 
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Weaulta; 

At  this  stage,  the  encoder  and  decoder  are  logically 
working  correctly.  It  looks  as  if  it  is  necessary  to  design 
and  use  separate  slope  command  filters  for  the  three 

possible  sampling  rates.  Each  filter  can  then  be  tuned  up 

so  that  it  matches  the  syllabic  rate  of  speech  and  also  the 
slope  command  output  can  be  properly  biased  and  amplified. 
The  observation  with  a single  filter  designed  for  a passband 
cut  off  at  32  Hz  based  on  a sampling  rate  of  20  KHz  leads  to 
heavy  slope  overload  when  the  sampling  rate  is  lower.  The 

integrator  filter  should  be  such  that  the  base  band  of  the 

reconstructed  speech  passes  through. 

The  output  of  each  module  is  presented  in  Figure  3. 


(a)  Input  speech  at  33  KHz  (b)  Encoder  output  at  33  KHz 


(c)  Decoder  output  at  33  Khz 


Figure  3:  Input  and  outputs  of  the  CVSD  System 


